Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3223
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMetin, Senem Kumova-
dc.contributor.authorTaze, Mehmet-
dc.date.accessioned2023-06-16T14:56:44Z-
dc.date.available2023-06-16T14:56:44Z-
dc.date.issued2017-
dc.identifier.isbn978-1-5386-0539-4-
dc.identifier.urihttps://hdl.handle.net/20.500.14365/3223-
dc.description2nd International Conference on Computer and Communication Systems (ICCCS) -- JUL 11-14, 2017 -- Kracow, POLANDen_US
dc.description.abstractIn this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges.en_US
dc.description.sponsorshipIEEEen_US
dc.description.sponsorshipTUBITAK - The Scientific and Technological Research Council of Turkey [115E469]en_US
dc.description.sponsorshipThis work is carried under the grant of TUBITAK - The Scientific and Technological Research Council of Turkey to Project No: 115E469, Identification of Multi-word Expressions in Turkish Texts.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartof2017 2Nd Internatıonal Conference on Computer And Communıcatıon Systems (Icccs2017)en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectmultiword expressionen_US
dc.subjectmultiword expression data seten_US
dc.subjectnatural langauge processingen_US
dc.subjectcorpusen_US
dc.titleA Procedure to Build Multiword Expression Data Seten_US
dc.typeConference Objecten_US
dc.identifier.doi10.1109/CCOMS.2017.8075264-
dc.identifier.scopus2-s2.0-85036469994en_US
dc.departmentİzmir Ekonomi Üniversitesien_US
dc.identifier.startpage46en_US
dc.identifier.endpage49en_US
dc.identifier.wosWOS:000425215100010en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityN/A-
dc.identifier.wosqualityN/A-
item.grantfulltextreserved-
item.openairetypeConference Object-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextWith Fulltext-
item.languageiso639-1en-
item.cerifentitytypePublications-
crisitem.author.dept05.04. Software Engineering-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File SizeFormat 
2332.pdf
  Restricted Access
521.16 kBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

3
checked on Nov 20, 2024

WEB OF SCIENCETM
Citations

1
checked on Nov 20, 2024

Page view(s)

48
checked on Nov 18, 2024

Download(s)

6
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.