Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/1012
Title: Landmark based guidance for reinforcement learning agents under partial observability
Authors: Demir, Alper
Cilden, Erkin
Polat, Faruk
Keywords: Diverse density
Landmark based guidance
Partial observability
Reinforcement learning
Temporal Abstraction
Framework
Publisher: Springer Heidelberg
Abstract: Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.
URI: https://doi.org/10.1007/s13042-022-01713-5
https://hdl.handle.net/20.500.14365/1012
ISSN: 1868-8071
1868-808X
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Files in This Item:
File SizeFormat 
15.pdf
  Restricted Access
2.36 MBAdobe PDFView/Open    Request a copy
Show full item record



CORE Recommender

SCOPUSTM   
Citations

4
checked on Nov 20, 2024

WEB OF SCIENCETM
Citations

6
checked on Nov 20, 2024

Page view(s)

72
checked on Nov 18, 2024

Download(s)

8
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.