A Procedure To Build Multiword Expression Data Set

Metin, Senem Kumova; Taze, Mehmet

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3223

Full metadata record

DC Field	Value	Language
dc.contributor.author	Metin, Senem Kumova	-
dc.contributor.author	Taze, Mehmet	-
dc.date.accessioned	2023-06-16T14:56:44Z	-
dc.date.available	2023-06-16T14:56:44Z	-
dc.date.issued	2017	-
dc.identifier.isbn	978-1-5386-0539-4	-
dc.identifier.uri	https://hdl.handle.net/20.500.14365/3223	-
dc.description	2nd International Conference on Computer and Communication Systems (ICCCS) -- JUL 11-14, 2017 -- Kracow, POLAND	en_US
dc.description.abstract	In this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges.	en_US
dc.description.sponsorship	IEEE	en_US
dc.description.sponsorship	TUBITAK - The Scientific and Technological Research Council of Turkey [115E469]	en_US
dc.description.sponsorship	This work is carried under the grant of TUBITAK - The Scientific and Technological Research Council of Turkey to Project No: 115E469, Identification of Multi-word Expressions in Turkish Texts.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartof	2017 2Nd Internatıonal Conference on Computer And Communıcatıon Systems (Icccs2017)	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	multiword expression	en_US
dc.subject	multiword expression data set	en_US
dc.subject	natural langauge processing	en_US
dc.subject	corpus	en_US
dc.title	A Procedure To Build Multiword Expression Data Set	en_US
dc.type	Conference Object	en_US
dc.identifier.doi	10.1109/CCOMS.2017.8075264	-
dc.identifier.scopus	2-s2.0-85036469994	-
dc.department	İzmir Ekonomi Üniversitesi	en_US
dc.identifier.startpage	46	en_US
dc.identifier.endpage	49	en_US
dc.identifier.wos	WOS:000425215100010	-
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.identifier.scopusquality	N/A	-
dc.identifier.wosquality	N/A	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.fulltext	With Fulltext	-
item.openairetype	Conference Object	-
item.cerifentitytype	Publications	-
item.grantfulltext	reserved	-
crisitem.author.dept	05.04. Software Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Files in This Item:

File	Size	Format
2332.pdf Restricted Access	521.16 kB	Adobe PDF	View/Open

Show simple item record

CORE Recommender

SCOPUS^TM
Citations

3

checked on Jun 18, 2025

WEB OF SCIENCE^TM
Citations

1

checked on Jun 18, 2025

Page view(s)

118

checked on Jun 23, 2025

Download(s)

6

checked on Jun 23, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM