Measuring Collocation Tendency of Words

dc.contributor.author Metin, Senem Kumova
dc.contributor.author Karaoglan, Bahar
dc.date.accessioned 2023-06-16T14:19:02Z
dc.date.available 2023-06-16T14:19:02Z
dc.date.issued 2011
dc.description.abstract In all natural languages, some words collocate with other words to create multi-worded blocks of meaning - the collocations. Since identification of collocations is vital for information retrieval, language learning, psycholinguistics, authorship determination and translation, collocation extraction is an important issue in natural language processing. In this paper we present a method which is designed to improve current statistical methods that generate ranked lists of collocation candidates. Due to meaning integrity, any word in a collocation must suggest or at least imply the subsequent words composing the collocation. As a result, we may state that the words in a random text differ in the tendency to facilitate the prediction of the next word. If a word helps the prediction then it tends to collocate, otherwise it does not. In this paper, an attempt has been made to extract collocations by measuring collocation tendency of words and word combinations. The method used is to filter out free word pairs (the words that do not facilitate the prediction of the next word or those in which meaning integrity has not been completed yet) in the lists of candidate pairs. Collocation tendency method is tested on a base data set extracted by some statistical collocation extraction techniques (frequency of occurrence, point-wise mutual information, the t-test, chi-square techniques) and is evaluated by precision and recall measures. We have found that collocation tendency method brings a remarkable improvement on frequency of occurrence and the t-test techniques. en_US
dc.identifier.doi 10.1080/09296174.2011.556005
dc.identifier.issn 0929-6174
dc.identifier.issn 1744-5035
dc.identifier.scopus 2-s2.0-79957991723
dc.identifier.uri https://doi.org/10.1080/09296174.2011.556005
dc.identifier.uri https://hdl.handle.net/20.500.14365/1654
dc.language.iso en en_US
dc.publisher Routledge Journals, Taylor & Francis Ltd en_US
dc.relation.ispartof Journal of Quantıtatıve Lınguıstıcs en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.title Measuring Collocation Tendency of Words en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 24471923700
gdc.author.scopusid 22334152300
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İEÜ, Mühendislik Fakültesi, Yazılım Mühendisliği Bölümü en_US
gdc.description.departmenttemp [Metin, Senem Kumova] Izmir Univ Econ, Izmir, Turkey; [Karaoglan, Bahar] Ege Univ, Bornova, Turkey en_US
gdc.description.endpage 187 en_US
gdc.description.issue 2 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.startpage 174 en_US
gdc.description.volume 18 en_US
gdc.description.wosquality Q1
gdc.identifier.openalex W2003131523
gdc.identifier.wos WOS:000295585600003
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.743578E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.0362225E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.4276
gdc.openalex.normalizedpercentile 0.71
gdc.opencitations.count 3
gdc.plumx.crossrefcites 3
gdc.plumx.mendeley 32
gdc.plumx.scopuscites 6
gdc.scopus.citedcount 6
gdc.virtual.author Kumova Metin, Senem
gdc.wos.citedcount 3
relation.isAuthorOfPublication 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isAuthorOfPublication.latestForDiscovery 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isOrgUnitOfPublication 805c60d5-b806-4645-8214-dd40524c388f
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery 805c60d5-b806-4645-8214-dd40524c388f

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
1654.pdf
Size:
382.32 KB
Format:
Adobe Portable Document Format