A Procedure To Build Multiword Expression Data Set
Loading...
Files
Date
2017
Authors
Metin, Senem Kumova
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
7
OpenAIRE Views
5
Publicly Funded
No
Abstract
In this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges.
Description
2nd International Conference on Computer and Communication Systems (ICCCS) -- JUL 11-14, 2017 -- Kracow, POLAND
Keywords
multiword expression, multiword expression data set, natural langauge processing, corpus
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
WoS Q
N/A
Scopus Q
N/A

OpenCitations Citation Count
1
Source
2017 2Nd Internatıonal Conference on Computer And Communıcatıon Systems (Icccs2017)
Volume
Issue
Start Page
46
End Page
49
PlumX Metrics
Citations
Scopus : 3
Captures
Mendeley Readers : 3
SCOPUS™ Citations
3
checked on Mar 16, 2026
Web of Science™ Citations
1
checked on Mar 16, 2026
Page Views
3
checked on Mar 16, 2026
Google Scholar™


