Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3406
Title: Collocation extraction in Turkish texts using statistical methods
Authors: Kumova Metin S.
Karao?lan B.
Keywords: Collocation
collocation extraction
Collocation
collocation extraction
Frequency of occurrences
Hypothesis tests
Machine translations
Mutual information method
Mutual informations
NAtural language processing
Part of speech tagging
Recall and precision
Statistical techniques
Turkish texts
Turkishs
Word Sense Disambiguation
Computational linguistics
Information theory
Speech transmission
Statistical tests
Natural language processing systems
Abstract: Collocation is the combination of words in which words appear together more often than by chance. Since collocations are blocks of meaning, they play an important role in natural language processing applications (word sense disambiguation, part of speech tagging, machine translation, etc). In this study, a corpus of Turkish is subjected to the following statistical techniques: frequency of occurrence, mutual information and hypothesis tests. We have utilized both stemmed and surface form of corpus to explore the effect of stemming in collocation extraction. The techniques are evaluated by recall and precision measures. Chi-square hypothesis test and mutual information methods have produced better results compared to other methods on Turkish corpus. In addition, we have found that a stemmed corpus facilitates discrimination between successful and unsuccessful collocation extraction methods. © 2010 Springer-Verlag Berlin Heidelberg.
Description: IZETeam;Microsoft Island;Post and Telecom Administration
7th International Conference on NLP, IceTAL 2010 -- 16 August 2010 through 18 August 2010 -- Reykjavik -- 81659
URI: https://doi.org/10.1007/978-3-642-14770-8_27
https://hdl.handle.net/20.500.14365/3406
ISBN: 3642147690
9783642147692
ISSN: 0302-9743
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Files in This Item:
File SizeFormat 
2514.pdf
  Until 2030-01-01
254.63 kBAdobe PDFView/Open
Show full item record



CORE Recommender

SCOPUSTM   
Citations

14
checked on Oct 2, 2024

WEB OF SCIENCETM
Citations

10
checked on Oct 2, 2024

Page view(s)

62
checked on Sep 30, 2024

Download(s)

12
checked on Sep 30, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.