Identifying Collocations in Turkish Using Statistical Methods

Loading...
Publication Logo

Date

2016

Authors

Metin, S.K.

Journal Title

Journal ISSN

Volume Title

Publisher

Ahmet Yesevi University

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

Collocation is the combination of words in which words appear together more often than by chance in order to create a block of meaning. Since the extraction of collocations provides many benefits in automatic processing, translation of Turkish texts and in learning Turkish, it is an important issue in Turkish natural language processing. In this study several statistical techniques, including occurrence frequency, pointwise mutual information and hypothesis tests, are applied on Turkey Turkish corpus to automatically identify collocations. We have utilized both stemmed and surface forms of words in order to explore the effect of stemming in collocation extraction. The techniques are evaluated using the F-measure. The chi-square hypothesis test and pointwise mutual information methods have produced better results compared to other methods. In addition, we have observed that when words are stemmed, methods which may be considered as successful in collocation extraction may be more clearly discriminated. © 2016, Ahmet Yesevi University. All rights reserved.

Description

Keywords

Collocation, Corpus, Natural language processing, Turkey Turkish

Fields of Science

Citation

WoS Q

Q3

Scopus Q

Q4

Source

Bilig

Volume

78

Issue

Start Page

253

End Page

286
Google Scholar Logo
Google Scholar™

Sustainable Development Goals