Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14365/2815
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Karaoglan, Bahar | - |
dc.contributor.author | Kisla, Tarik | - |
dc.contributor.author | Dincer, Bekir Taner | - |
dc.contributor.author | Metin, Senem Kumova | - |
dc.date.accessioned | 2023-06-16T14:50:29Z | - |
dc.date.available | 2023-06-16T14:50:29Z | - |
dc.date.issued | 2013 | - |
dc.identifier.isbn | 978-1-4673-5563-6 | - |
dc.identifier.isbn | 978-1-4673-5562-9 | - |
dc.identifier.issn | 2165-0608 | - |
dc.identifier.uri | https://hdl.handle.net/20.500.14365/2815 | - |
dc.description | 21st Signal Processing and Communications Applications Conference (SIU) -- APR 24-26, 2013 -- CYPRUS | en_US |
dc.description.abstract | In order to compare work done under natural language processing, the corpora involved in different studies should be standardized/normalized. Entropy, used as language model performance metric, totally depends on signal information. Whereas, when language is considered semantic information should also be considered. Here we propose a metric that exploits Zipf's and Heaps' power laws to respresent semantic information in terms of signal information and estimates the amount of information anticipated from a corpus of given length in words. The proposed metric is tested on 20 different lengths of sub-corpora drawn from major corpus in Turkish (METU). While the entropy changed depending on the length of the corpus, the value of our proposed metric stayed almost constant which supports our claim about normalizing the corpus. | en_US |
dc.language.iso | tr | en_US |
dc.publisher | IEEE | en_US |
dc.relation.ispartof | 2013 21St Sıgnal Processıng And Communıcatıons Applıcatıons Conference (Sıu) | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | language model performance | en_US |
dc.subject | corpus comparison | en_US |
dc.subject | cross entropy | en_US |
dc.title | A Proposal for Corpus Normalization | en_US |
dc.type | Conference Object | en_US |
dc.identifier.doi | 10.1109/SIU.2013.6531217 | - |
dc.identifier.scopus | 2-s2.0-84880873119 | en_US |
dc.department | İzmir Ekonomi Üniversitesi | en_US |
dc.authorid | Dinçer, Bekir Taner/0000-0002-0660-7239 | - |
dc.authorwosid | Dinçer, Bekir Taner/AAU-7709-2020 | - |
dc.identifier.wos | WOS:000325005300058 | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.identifier.scopusquality | N/A | - |
dc.identifier.wosquality | N/A | - |
item.grantfulltext | embargo_20300101 | - |
item.openairetype | Conference Object | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.fulltext | With Fulltext | - |
item.languageiso639-1 | tr | - |
item.cerifentitytype | Publications | - |
crisitem.author.dept | 05.04. Software Engineering | - |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection |
Files in This Item:
File | Size | Format | |
---|---|---|---|
2815.pdf Until 2030-01-01 | 323.29 kB | Adobe PDF | View/Open Request a copy |
CORE Recommender
Page view(s)
92
checked on Nov 18, 2024
Download(s)
6
checked on Nov 18, 2024
Google ScholarTM
Check
Altmetric
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.