Repository logoGCRIS
  • English
  • Türkçe
  • Русский
Log In
New user? Click here to register. Have you forgotten your password?
Home
Communities
Browse GCRIS
Entities
Overview
GCRIS Guide
  1. Home
  2. Browse by Author

Browsing by Author "Dincer, Bekir Taner"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Conference Object
    A Proposal for Corpus Normalization
    (IEEE, 2013) Karaoglan, Bahar; Kisla, Tarik; Dincer, Bekir Taner; Metin, Senem Kumova
    In order to compare work done under natural language processing, the corpora involved in different studies should be standardized/normalized. Entropy, used as language model performance metric, totally depends on signal information. Whereas, when language is considered semantic information should also be considered. Here we propose a metric that exploits Zipf's and Heaps' power laws to respresent semantic information in terms of signal information and estimates the amount of information anticipated from a corpus of given length in words. The proposed metric is tested on 20 different lengths of sub-corpora drawn from major corpus in Turkish (METU). While the entropy changed depending on the length of the corpus, the value of our proposed metric stayed almost constant which supports our claim about normalizing the corpus.
Repository logo
Collections
  • Scopus Collection
  • WoS Collection
  • TrDizin Collection
  • PubMed Collection
Entities
  • Research Outputs
  • Organizations
  • Researchers
  • Projects
  • Awards
  • Equipments
  • Events
About
  • Contact
  • GCRIS
  • Research Ecosystems
  • Feedback
  • OAI-PMH

Log in to GCRIS Dashboard

GCRIS Mobile

Download GCRIS Mobile on the App StoreGet GCRIS Mobile on Google Play

Powered by Research Ecosystems

  • Privacy policy
  • End User Agreement
  • Feedback