Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability

dc.contributor.author Demir, Alper
dc.contributor.author Cilden, Erkin
dc.contributor.author Polat, Faruk
dc.date.accessioned 2023-06-16T12:48:22Z
dc.date.available 2023-06-16T12:48:22Z
dc.date.issued 2023
dc.description.abstract Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems. en_US
dc.identifier.doi 10.1007/s13042-022-01713-5
dc.identifier.issn 1868-8071
dc.identifier.issn 1868-808X
dc.identifier.scopus 2-s2.0-85141991253
dc.identifier.uri https://doi.org/10.1007/s13042-022-01713-5
dc.identifier.uri https://hdl.handle.net/20.500.14365/1012
dc.language.iso en en_US
dc.publisher Springer Heidelberg en_US
dc.relation.ispartof Internatıonal Journal of Machıne Learnıng And Cybernetıcs en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Diverse density en_US
dc.subject Landmark based guidance en_US
dc.subject Partial observability en_US
dc.subject Reinforcement learning en_US
dc.subject Temporal Abstraction en_US
dc.subject Framework en_US
dc.title Landmark Based Guidance for Reinforcement Learning Agents Under Partial Observability en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Demir, Alper/0000-0003-2646-4850
gdc.author.id Polat, Faruk/0000-0003-0509-9153
gdc.author.scopusid 57549355800
gdc.author.scopusid 55753120200
gdc.author.scopusid 7003321824
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İEÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
gdc.description.departmenttemp [Demir, Alper] Izmir Univ Econ, Dept Comp Engn, TR-35330 Izmir, Turkey; [Cilden, Erkin] STM Def Technol Engn & Trade Inc, TR-06530 Ankara, Turkey; [Polat, Faruk] Middle East Tech Univ, Dept Comp Engn, TR-06531 Ankara, Turkey en_US
gdc.description.endpage 1563 en_US
gdc.description.issue 4 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.startpage 1543 en_US
gdc.description.volume 14 en_US
gdc.description.wosquality Q3
gdc.identifier.openalex W4309346029
gdc.identifier.wos WOS:000884644500002
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 5.0
gdc.oaire.influence 2.800527E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 5.611603E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.8276
gdc.openalex.normalizedpercentile 0.78
gdc.opencitations.count 4
gdc.plumx.crossrefcites 4
gdc.plumx.mendeley 3
gdc.plumx.newscount 1
gdc.plumx.scopuscites 6
gdc.scopus.citedcount 6
gdc.virtual.author Demir, Alper
gdc.wos.citedcount 6
relation.isAuthorOfPublication c9c431c0-6d14-4dac-87af-29d85e10ef21
relation.isAuthorOfPublication.latestForDiscovery c9c431c0-6d14-4dac-87af-29d85e10ef21
relation.isOrgUnitOfPublication b4714bc5-c5ae-478f-b962-b7204c948b70
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery b4714bc5-c5ae-478f-b962-b7204c948b70

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
15.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format