Description of Turkish Paraphrase Corpus Structure and Generation Method

Loading...
Publication Logo

Date

2018

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Springer International Publishing Ag

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

13

OpenAIRE Views

2

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases.

Description

17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 03-09, 2016 -- Mevlana Univ, Konya, TURKEY

Keywords

Turkish, Paraphrase, Corpus generation, Turkish, Corpus generation, Paraphrase

Fields of Science

05 social sciences, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, 02 engineering and technology

Citation

WoS Q

N/A

Scopus Q

Q3
OpenCitations Logo
OpenCitations Citation Count
2

Source

Computatıonal Lınguıstıcs And Intellıgent Text Processıng, (Cıclıng 2016), Pt I

Volume

9623

Issue

Start Page

208

End Page

217
PlumX Metrics
Citations

CrossRef : 2

Scopus : 3

Captures

Mendeley Readers : 2

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.8156

Sustainable Development Goals

SDG data is not available