Collocation Extraction in Turkish Texts Using Statistical Methods

Loading...
Publication Logo

Date

2010

Authors

Kumova Metin S.

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Collocation is the combination of words in which words appear together more often than by chance. Since collocations are blocks of meaning, they play an important role in natural language processing applications (word sense disambiguation, part of speech tagging, machine translation, etc). In this study, a corpus of Turkish is subjected to the following statistical techniques: frequency of occurrence, mutual information and hypothesis tests. We have utilized both stemmed and surface form of corpus to explore the effect of stemming in collocation extraction. The techniques are evaluated by recall and precision measures. Chi-square hypothesis test and mutual information methods have produced better results compared to other methods on Turkish corpus. In addition, we have found that a stemmed corpus facilitates discrimination between successful and unsuccessful collocation extraction methods. © 2010 Springer-Verlag Berlin Heidelberg.

Description

IZETeam;Microsoft Island;Post and Telecom Administration
7th International Conference on NLP, IceTAL 2010 -- 16 August 2010 through 18 August 2010 -- Reykjavik -- 81659

Keywords

Collocation, collocation extraction, Collocation, collocation extraction, Frequency of occurrences, Hypothesis tests, Machine translations, Mutual information method, Mutual informations, NAtural language processing, Part of speech tagging, Recall and precision, Statistical techniques, Turkish texts, Turkishs, Word Sense Disambiguation, Computational linguistics, Information theory, Speech transmission, Statistical tests, Natural language processing systems, collocation extraction, Collocation

Fields of Science

Citation

WoS Q

N/A

Scopus Q

Q3
OpenCitations Logo
OpenCitations Citation Count
6

Source

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

6233 LNAI

Issue

Start Page

238

End Page

249
PlumX Metrics
Citations

CrossRef : 4

Scopus : 14

Captures

Mendeley Readers : 13

SCOPUS™ Citations

14

checked on Mar 16, 2026

Web of Science™ Citations

10

checked on Mar 16, 2026

Page Views

2

checked on Mar 16, 2026

Downloads

15

checked on Mar 16, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.0

Sustainable Development Goals

SDG data is not available