Enlarging Multiword Expression Dataset by Co-Training

Loading...
Publication Logo

Date

2018

Authors

Kumova Metin, Senem

Journal Title

Journal ISSN

Volume Title

Publisher

Scientific Technical Research Council Turkey-Tubitak

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

12

OpenAIRE Views

54

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

In multiword expressions (MWEs), multiple words unite to build a new unit in language. When MWE identification is accepted as a binary classification task, one of the most important factors in performance is to train the classifier with enough number of labelled samples. Since manual labelling is a time-consuming task, the performances of MWE recognition studies are limited with the size of the training sets. In this study, we propose the comparison-based and common-decision co-training approaches in order to enlarge the MWE dataset. In the experiments, the performances of the proposed approaches were compared to those of the standard co-training [1] and manual labelling where statistical and linguistic features are employed as two different views of the MWE dataset [2]. A number of tests with different settings were performed on a Turkish MWE dataset. Ten different classifiers were utilized in the experiments and the best performing classifier pair was observed to be the SMO-SMO pair. The experimental results showed that the common-decision co-training approach is an alternative to hand-labeling of large MWE datasets and both newly proposed approaches outperform the standard co-training [2] when the training set is to be enlarged in MWE classification.

Description

Keywords

Multiword expression, classification, training set, co-training

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

Q3

Scopus Q

Q2
OpenCitations Logo
OpenCitations Citation Count
N/A

Source

Turkısh Journal of Electrıcal Engıneerıng And Computer Scıences

Volume

26

Issue

5

Start Page

2583

End Page

2594
PlumX Metrics
Citations

Scopus : 0

Captures

Mendeley Readers : 1

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.0

Sustainable Development Goals