Enlarging Multiword Expression Dataset by Co-Training

dc.contributor.author Kumova Metin, Senem
dc.date.accessioned 2023-06-16T14:41:20Z
dc.date.available 2023-06-16T14:41:20Z
dc.date.issued 2018
dc.description.abstract In multiword expressions (MWEs), multiple words unite to build a new unit in language. When MWE identification is accepted as a binary classification task, one of the most important factors in performance is to train the classifier with enough number of labelled samples. Since manual labelling is a time-consuming task, the performances of MWE recognition studies are limited with the size of the training sets. In this study, we propose the comparison-based and common-decision co-training approaches in order to enlarge the MWE dataset. In the experiments, the performances of the proposed approaches were compared to those of the standard co-training [1] and manual labelling where statistical and linguistic features are employed as two different views of the MWE dataset [2]. A number of tests with different settings were performed on a Turkish MWE dataset. Ten different classifiers were utilized in the experiments and the best performing classifier pair was observed to be the SMO-SMO pair. The experimental results showed that the common-decision co-training approach is an alternative to hand-labeling of large MWE datasets and both newly proposed approaches outperform the standard co-training [2] when the training set is to be enlarged in MWE classification. en_US
dc.description.sponsorship Scientific and Technological Research Council of Turkey [115E469] en_US
dc.description.sponsorship This work was carried out under the grant of The Scientific and Technological Research Council of Turkey (Project No. 115E469, Identification of Multiword Expressions in Turkish Texts). Further information/statistics on the MWE dataset is available on the project web page (http://app.ieu-nlpteam.com:8000). en_US
dc.identifier.doi 10.3906/elk-1709-185
dc.identifier.issn 1300-0632
dc.identifier.issn 1303-6203
dc.identifier.scopus 2-s2.0-85054525652
dc.identifier.uri https://doi.org/10.3906/elk-1709-185
dc.identifier.uri https://hdl.handle.net/20.500.14365/2600
dc.language.iso en en_US
dc.publisher Scientific Technical Research Council Turkey-Tubitak en_US
dc.relation.ispartof Turkısh Journal of Electrıcal Engıneerıng And Computer Scıences en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Multiword expression en_US
dc.subject classification en_US
dc.subject training set en_US
dc.subject co-training en_US
dc.title Enlarging Multiword Expression Dataset by Co-Training en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Kumova Metin, Senem/0000-0002-9606-3625
gdc.author.scopusid 24471923700
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İEÜ, Mühendislik Fakültesi, Yazılım Mühendisliği Bölümü en_US
gdc.description.departmenttemp [Kumova Metin, Senem] Izmir Univ Econ, Fac Engn, Dept Software Engn, Izmir, Turkey en_US
gdc.description.endpage 2594 en_US
gdc.description.issue 5 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.startpage 2583 en_US
gdc.description.volume 26 en_US
gdc.description.wosquality Q3
gdc.identifier.openalex W2895395003
gdc.identifier.trdizinid 323563
gdc.identifier.wos WOS:000448109200034
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type TR-Dizin
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 12
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen true
gdc.oaire.popularity 1.0376504E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.views 54
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.11
gdc.opencitations.count 0
gdc.plumx.mendeley 1
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Kumova Metin, Senem
gdc.wos.citedcount 0
relation.isAuthorOfPublication 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isAuthorOfPublication.latestForDiscovery 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isOrgUnitOfPublication 805c60d5-b806-4645-8214-dd40524c388f
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery 805c60d5-b806-4645-8214-dd40524c388f

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2600.pdf
Size:
506.55 KB
Format:
Adobe Portable Document Format