Description of Turkish Paraphrase Corpus Structure and Generation Method

dc.contributor.author Karaoglan, Bahar
dc.contributor.author Kisla, Tarik
dc.contributor.author Metin, Senem Kumova
dc.date.accessioned 2023-06-16T12:47:39Z
dc.date.available 2023-06-16T12:47:39Z
dc.date.issued 2018
dc.description 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY en_US
dc.description.abstract Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases. en_US
dc.description.sponsorship TUBITAK - The Scientific and Technological Research Council of Turkey [114E126]; Ege University Scientific Research Council [2015/BIL/034] en_US
dc.description.sponsorship This work is carried under the grant of TUBITAK - The Scientific and Technological Research Council of Turkey to Project No: 114E126, Using Certainty Factor Approach and Creating Paraphrase Corpus for Measuring Similarity of Short Turkish Texts and Ege University Scientific Research Council Project No 2015/BIL/034, Developing a Paraphrase Corpus for Turkish Short Text Similarity Studies. en_US
dc.identifier.doi 10.1007/978-3-319-75477-2_13
dc.identifier.isbn 978-3-319-75477-2
dc.identifier.isbn 978-3-319-75476-5
dc.identifier.issn 0302-9743
dc.identifier.issn 1611-3349
dc.identifier.scopus 2-s2.0-85044430201
dc.identifier.uri https://doi.org/10.1007/978-3-319-75477-2_13
dc.identifier.uri https://hdl.handle.net/20.500.14365/824
dc.language.iso en en_US
dc.publisher Springer International Publishing Ag en_US
dc.relation.ispartof Computatıonal Lınguıstıcs And Intellıgent Text Processıng, (Cıclıng 2016), Pt I en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Turkish en_US
dc.subject Paraphrase en_US
dc.subject Corpus generation en_US
dc.title Description of Turkish Paraphrase Corpus Structure and Generation Method en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.id KARAOGLAN, BAHAR/0000-0001-9338-7491
gdc.author.id KISLA, TARIK/0000-0001-9007-7455
gdc.author.scopusid 22334152300
gdc.author.scopusid 24314851200
gdc.author.scopusid 24471923700
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department İzmir Ekonomi Üniversitesi en_US
gdc.description.departmenttemp [Karaoglan, Bahar; Kisla, Tarik] Ege Univ, Izmir, Turkey; [Metin, Senem Kumova] Izmir Univ Econ, Izmir, Turkey en_US
gdc.description.endpage 217 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.startpage 208 en_US
gdc.description.volume 9623 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W2793324584
gdc.identifier.wos WOS:000540380100013
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.downloads 13
gdc.oaire.impulse 2.0
gdc.oaire.influence 2.881022E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Turkish
gdc.oaire.keywords Corpus generation
gdc.oaire.keywords Paraphrase
gdc.oaire.popularity 1.894736E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 05 social sciences
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 0501 psychology and cognitive sciences
gdc.oaire.sciencefields 02 engineering and technology
gdc.oaire.views 2
gdc.openalex.collaboration National
gdc.openalex.fwci 0.8156
gdc.openalex.normalizedpercentile 0.74
gdc.opencitations.count 2
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 2
gdc.plumx.scopuscites 3
gdc.scopus.citedcount 3
gdc.virtual.author Kumova Metin, Senem
gdc.wos.citedcount 1
relation.isAuthorOfPublication 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isAuthorOfPublication.latestForDiscovery 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isOrgUnitOfPublication 805c60d5-b806-4645-8214-dd40524c388f
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery 805c60d5-b806-4645-8214-dd40524c388f

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
824.pdf
Size:
293.44 KB
Format:
Adobe Portable Document Format