Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/1012
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDemir, Alper-
dc.contributor.authorCilden, Erkin-
dc.contributor.authorPolat, Faruk-
dc.date.accessioned2023-06-16T12:48:22Z-
dc.date.available2023-06-16T12:48:22Z-
dc.date.issued2023-
dc.identifier.issn1868-8071-
dc.identifier.issn1868-808X-
dc.identifier.urihttps://doi.org/10.1007/s13042-022-01713-5-
dc.identifier.urihttps://hdl.handle.net/20.500.14365/1012-
dc.description.abstractUnder partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.en_US
dc.language.isoenen_US
dc.publisherSpringer Heidelbergen_US
dc.relation.ispartofInternatıonal Journal of Machıne Learnıng And Cybernetıcsen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDiverse densityen_US
dc.subjectLandmark based guidanceen_US
dc.subjectPartial observabilityen_US
dc.subjectReinforcement learningen_US
dc.subjectTemporal Abstractionen_US
dc.subjectFrameworken_US
dc.titleLandmark based guidance for reinforcement learning agents under partial observabilityen_US
dc.typeArticleen_US
dc.identifier.doi10.1007/s13042-022-01713-5-
dc.identifier.scopus2-s2.0-85141991253en_US
dc.departmentİzmir Ekonomi Üniversitesien_US
dc.authoridDemir, Alper/0000-0003-2646-4850-
dc.authoridPolat, Faruk/0000-0003-0509-9153-
dc.authorscopusid57549355800-
dc.authorscopusid55753120200-
dc.authorscopusid7003321824-
dc.identifier.volume14en_US
dc.identifier.issue4en_US
dc.identifier.startpage1543en_US
dc.identifier.endpage1563en_US
dc.identifier.wosWOS:000884644500002en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityQ1-
dc.identifier.wosqualityQ2-
item.grantfulltextreserved-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.openairetypeArticle-
item.fulltextWith Fulltext-
item.languageiso639-1en-
crisitem.author.dept05.05. Computer Engineering-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File SizeFormat 
15.pdf
  Restricted Access
2.36 MBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

4
checked on Oct 2, 2024

WEB OF SCIENCETM
Citations

6
checked on Oct 2, 2024

Page view(s)

64
checked on Sep 30, 2024

Download(s)

8
checked on Sep 30, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.