Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability

Demir, Alper; Cilden, Erkin; Polat, Faruk

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/1012

Full metadata record

DC Field	Value	Language
dc.contributor.author	Demir, Alper	-
dc.contributor.author	Cilden, Erkin	-
dc.contributor.author	Polat, Faruk	-
dc.date.accessioned	2023-06-16T12:48:22Z	-
dc.date.available	2023-06-16T12:48:22Z	-
dc.date.issued	2023	-
dc.identifier.issn	1868-8071	-
dc.identifier.issn	1868-808X	-
dc.identifier.uri	https://doi.org/10.1007/s13042-022-01713-5	-
dc.identifier.uri	https://hdl.handle.net/20.500.14365/1012	-
dc.description.abstract	Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Heidelberg	en_US
dc.relation.ispartof	Internatıonal Journal of Machıne Learnıng And Cybernetıcs	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Diverse density	en_US
dc.subject	Landmark based guidance	en_US
dc.subject	Partial observability	en_US
dc.subject	Reinforcement learning	en_US
dc.subject	Temporal Abstraction	en_US
dc.subject	Framework	en_US
dc.title	Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1007/s13042-022-01713-5	-
dc.identifier.scopus	2-s2.0-85141991253	-
dc.department	İzmir Ekonomi Üniversitesi	en_US
dc.authorid	Demir, Alper/0000-0003-2646-4850	-
dc.authorid	Polat, Faruk/0000-0003-0509-9153	-
dc.authorscopusid	57549355800	-
dc.authorscopusid	55753120200	-
dc.authorscopusid	7003321824	-
dc.identifier.volume	14	en_US
dc.identifier.issue	4	en_US
dc.identifier.startpage	1543	en_US
dc.identifier.endpage	1563	en_US
dc.identifier.wos	WOS:000884644500002	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.scopusquality	Q1	-
dc.identifier.wosquality	Q2	-
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairetype	Article	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	reserved	-
item.cerifentitytype	Publications	-
crisitem.author.dept	05.05. Computer Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Files in This Item:

File	Size	Format
15.pdf Restricted Access	2.36 MB	Adobe PDF	View/Open

Show simple item record

CORE Recommender

SCOPUS^TM
Citations

4

checked on Aug 6, 2025

WEB OF SCIENCE^TM
Citations

6

checked on Aug 6, 2025

Page view(s)

148

checked on Aug 11, 2025

Download(s)

10

checked on Aug 11, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM