Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/2815
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKaraoglan, Bahar-
dc.contributor.authorKisla, Tarik-
dc.contributor.authorDincer, Bekir Taner-
dc.contributor.authorMetin, Senem Kumova-
dc.date.accessioned2023-06-16T14:50:29Z-
dc.date.available2023-06-16T14:50:29Z-
dc.date.issued2013-
dc.identifier.isbn978-1-4673-5563-6-
dc.identifier.isbn978-1-4673-5562-9-
dc.identifier.issn2165-0608-
dc.identifier.urihttps://hdl.handle.net/20.500.14365/2815-
dc.description21st Signal Processing and Communications Applications Conference (SIU) -- APR 24-26, 2013 -- CYPRUSen_US
dc.description.abstractIn order to compare work done under natural language processing, the corpora involved in different studies should be standardized/normalized. Entropy, used as language model performance metric, totally depends on signal information. Whereas, when language is considered semantic information should also be considered. Here we propose a metric that exploits Zipf's and Heaps' power laws to respresent semantic information in terms of signal information and estimates the amount of information anticipated from a corpus of given length in words. The proposed metric is tested on 20 different lengths of sub-corpora drawn from major corpus in Turkish (METU). While the entropy changed depending on the length of the corpus, the value of our proposed metric stayed almost constant which supports our claim about normalizing the corpus.en_US
dc.language.isotren_US
dc.publisherIEEEen_US
dc.relation.ispartof2013 21St Sıgnal Processıng And Communıcatıons Applıcatıons Conference (Sıu)en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectlanguage model performanceen_US
dc.subjectcorpus comparisonen_US
dc.subjectcross entropyen_US
dc.titleA Proposal for Corpus Normalizationen_US
dc.typeConference Objecten_US
dc.identifier.doi10.1109/SIU.2013.6531217-
dc.identifier.scopus2-s2.0-84880873119en_US
dc.departmentİzmir Ekonomi Üniversitesien_US
dc.authoridDinçer, Bekir Taner/0000-0002-0660-7239-
dc.authorwosidDinçer, Bekir Taner/AAU-7709-2020-
dc.identifier.wosWOS:000325005300058en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityN/A-
dc.identifier.wosqualityN/A-
item.grantfulltextembargo_20300101-
item.openairetypeConference Object-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextWith Fulltext-
item.languageiso639-1tr-
item.cerifentitytypePublications-
crisitem.author.dept05.04. Software Engineering-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File SizeFormat 
2815.pdf
  Until 2030-01-01
323.29 kBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

Page view(s)

92
checked on Nov 18, 2024

Download(s)

6
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.