A Procedure To Build Multiword Expression Data Set

Loading...
Publication Logo

Date

2017

Authors

Metin, Senem Kumova

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

7

OpenAIRE Views

5

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

In this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges.

Description

2nd International Conference on Computer and Communication Systems (ICCCS) -- JUL 11-14, 2017 -- Kracow, POLAND

Keywords

multiword expression, multiword expression data set, natural langauge processing, corpus

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
1

Source

2017 2Nd Internatıonal Conference on Computer And Communıcatıon Systems (Icccs2017)

Volume

Issue

Start Page

46

End Page

49
PlumX Metrics
Citations

Scopus : 3

Captures

Mendeley Readers : 3

SCOPUS™ Citations

3

checked on Mar 16, 2026

Web of Science™ Citations

1

checked on Mar 16, 2026

Page Views

3

checked on Mar 16, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.195

Sustainable Development Goals

SDG data is not available