Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability
Loading...
Files
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Heidelberg
Open Access Color
Green Open Access
No
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.
Description
Keywords
Diverse density, Landmark based guidance, Partial observability, Reinforcement learning, Temporal Abstraction, Framework
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
WoS Q
Q3
Scopus Q
Q2

OpenCitations Citation Count
4
Source
Internatıonal Journal of Machıne Learnıng And Cybernetıcs
Volume
14
Issue
4
Start Page
1543
End Page
1563
PlumX Metrics
Citations
CrossRef : 4
Scopus : 6
Captures
Mendeley Readers : 3
SCOPUS™ Citations
6
checked on Apr 20, 2026
Web of Science™ Citations
6
checked on Apr 20, 2026
Page Views
4
checked on Apr 20, 2026
Google Scholar™


