A Procedure To Build Multiword Expression Data Set

dc.contributor.author Metin, Senem Kumova
dc.contributor.author Taze, Mehmet
dc.date.accessioned 2023-06-16T14:56:44Z
dc.date.available 2023-06-16T14:56:44Z
dc.date.issued 2017
dc.description 2nd International Conference on Computer and Communication Systems (ICCCS) -- JUL 11-14, 2017 -- Kracow, POLAND en_US
dc.description.abstract In this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges. en_US
dc.description.sponsorship IEEE en_US
dc.description.sponsorship TUBITAK - The Scientific and Technological Research Council of Turkey [115E469] en_US
dc.description.sponsorship This work is carried under the grant of TUBITAK - The Scientific and Technological Research Council of Turkey to Project No: 115E469, Identification of Multi-word Expressions in Turkish Texts. en_US
dc.identifier.doi 10.1109/CCOMS.2017.8075264
dc.identifier.isbn 978-1-5386-0539-4
dc.identifier.scopus 2-s2.0-85036469994
dc.identifier.uri https://hdl.handle.net/20.500.14365/3223
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartof 2017 2Nd Internatıonal Conference on Computer And Communıcatıon Systems (Icccs2017) en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject multiword expression en_US
dc.subject multiword expression data set en_US
dc.subject natural langauge processing en_US
dc.subject corpus en_US
dc.title A Procedure To Build Multiword Expression Data Set en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department İzmir Ekonomi Üniversitesi en_US
gdc.description.departmenttemp [Metin, Senem Kumova] Izmir Univ Econ, Fac Engn, Dept Software Engn, Izmir, Turkey; [Taze, Mehmet] Izmir Univ Econ, Fac Engn, Dept Comp Engn, Izmir, Turkey en_US
gdc.description.endpage 49 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 46 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W2766809237
gdc.identifier.wos WOS:000425215100010
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.downloads 7
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.6757527E-9
gdc.oaire.isgreen true
gdc.oaire.popularity 9.74195E-10
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.views 5
gdc.openalex.collaboration National
gdc.openalex.fwci 0.195
gdc.openalex.normalizedpercentile 0.63
gdc.opencitations.count 1
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 3
gdc.scopus.citedcount 3
gdc.virtual.author Kumova Metin, Senem
gdc.wos.citedcount 1
relation.isAuthorOfPublication 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isAuthorOfPublication.latestForDiscovery 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isOrgUnitOfPublication 805c60d5-b806-4645-8214-dd40524c388f
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery 805c60d5-b806-4645-8214-dd40524c388f

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2332.pdf
Size:
521.16 KB
Format:
Adobe Portable Document Format