Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability

Demir, Alper; Cilden, Erkin; Polat, Faruk

Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability

dc.contributor.author	Demir, Alper
dc.contributor.author	Cilden, Erkin
dc.contributor.author	Polat, Faruk
dc.date.accessioned	2023-06-16T12:48:22Z
dc.date.available	2023-06-16T12:48:22Z
dc.date.issued	2023
dc.description.abstract	Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.	en_US
dc.identifier.doi	10.1007/s13042-022-01713-5
dc.identifier.issn	1868-8071
dc.identifier.issn	1868-808X
dc.identifier.scopus	2-s2.0-85141991253
dc.identifier.uri	https://doi.org/10.1007/s13042-022-01713-5
dc.identifier.uri	https://hdl.handle.net/20.500.14365/1012
dc.language.iso	en	en_US
dc.publisher	Springer Heidelberg	en_US
dc.relation.ispartof	Internatıonal Journal of Machıne Learnıng And Cybernetıcs	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Diverse density	en_US
dc.subject	Landmark based guidance	en_US
dc.subject	Partial observability	en_US
dc.subject	Reinforcement learning	en_US
dc.subject	Temporal Abstraction	en_US
dc.subject	Framework	en_US
dc.title	Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
gdc.author.id	Demir, Alper/0000-0003-2646-4850
gdc.author.id	Polat, Faruk/0000-0003-0509-9153
gdc.author.scopusid	57549355800
gdc.author.scopusid	55753120200
gdc.author.scopusid	7003321824
gdc.bip.impulseclass	C4
gdc.bip.influenceclass	C5
gdc.bip.popularityclass	C4
gdc.coar.access	metadata only access
gdc.coar.type	text::journal::journal article
gdc.collaboration.industrial	false
gdc.description.department	İEÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
gdc.description.departmenttemp	[Demir, Alper] Izmir Univ Econ, Dept Comp Engn, TR-35330 Izmir, Turkey; [Cilden, Erkin] STM Def Technol Engn & Trade Inc, TR-06530 Ankara, Turkey; [Polat, Faruk] Middle East Tech Univ, Dept Comp Engn, TR-06531 Ankara, Turkey	en_US
gdc.description.endpage	1563	en_US
gdc.description.issue	4	en_US
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	Q2
gdc.description.startpage	1543	en_US
gdc.description.volume	14	en_US
gdc.description.wosquality	Q3
gdc.identifier.openalex	W4309346029
gdc.identifier.wos	WOS:000884644500002
gdc.index.type	WoS
gdc.index.type	Scopus
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	5.0
gdc.oaire.influence	2.800527E-9
gdc.oaire.isgreen	false
gdc.oaire.popularity	5.611603E-9
gdc.oaire.publicfunded	false
gdc.oaire.sciencefields	0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields	02 engineering and technology
gdc.openalex.collaboration	National
gdc.openalex.fwci	0.8276
gdc.openalex.normalizedpercentile	0.78
gdc.opencitations.count	4
gdc.plumx.crossrefcites	4
gdc.plumx.mendeley	3
gdc.plumx.newscount	1
gdc.plumx.scopuscites	6
gdc.scopus.citedcount	6
gdc.virtual.author	Demir, Alper
gdc.wos.citedcount	6
relation.isAuthorOfPublication	c9c431c0-6d14-4dac-87af-29d85e10ef21
relation.isAuthorOfPublication.latestForDiscovery	c9c431c0-6d14-4dac-87af-29d85e10ef21
relation.isOrgUnitOfPublication	b4714bc5-c5ae-478f-b962-b7204c948b70
relation.isOrgUnitOfPublication	26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication	e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery	b4714bc5-c5ae-478f-b962-b7204c948b70

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 15.pdf
Size:: 2.31 MB
Format:: Adobe Portable Document Format

Download

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection