Browsing by Author "Kisla T."

Now showing 1 - 4 of 4

Citation - Scopus: 1
Attribute Value-Range Detection in Identification of Paraphrase Sentence Pairs
(Institute of Electrical and Electronics Engineers Inc., 2016) Kumova S.; Karaoglan B.; Kisla T.
Identification of paraphrase sentence pairs becomes increasingly prominent in natural language processing area (e.g plagiarism detection, summarization, machine translation). In this study, it is proposed to employ information gain measure in determining the value-ranges of the paraphrase classification features on the renown paraphrase corpus of Microsoft Research (MSRP). The classification performances of value-ranges that are determined by information gain measure and an alternative heuristic method are compared by the use of Bayes classifier. The results show that the proposed method performs better than the heuristic method. © 2016 IEEE.
Contribution of Syntactic and Semantic Attributes in Paraphrase Identification
(Institute of Electrical and Electronics Engineers Inc., 2018) Karaoglan B.; Kisla T.; Metin S.K.
Automatic paraphrase identification is a natural language understanding problem where a decision is to be made whether the given sentence pairs bare similar meanings to a certain extent. Syntactic and semantic features are used to classify the sentences as paraphrase or non-paraphrase. Word overlapping, word ordering are some of the syntactic features widely used in the literature, where, similarity of words in meaning and named entity (NE) overlap are among the semantic features. Turkish, unfortunately doesn't have a useful tool like WordNet to draw the semantic relations between words as it is done for English. Here we exploit tense and polarity differences as semantic features and assess the improvement on the classification brought by these semantic features. We performed the experiments with several different combinations of features on the Turkish paraphrase corpus that is built by the researchers and report the results. © 2018 IEEE.
The Impact of Sentence Embeddings in Turkish Paraphrase Detection
(Institute of Electrical and Electronics Engineers Inc., 2019) Karaoglan B.; Yorgancioglu H.E.; Kisla T.; Kumova Metin S.
In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as 'using average vector for word embeddings' (AWE), 'concatenated vectors for word embeddings' (CWE) and 'word mover's distance word embeddings' (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values. © 2019 IEEE.
Text Similarity Analysis Using Ir Lists
(2013) Metin S.K.; Karaoglan B.; Kisla T.
Natural language processing can be seen as a signal processing problem when the characters, syllabi, words, punctuations in a text are considered as signals. In this article, we present a novel approach that detects text similarity in Turkish, based on the similarities of the lists of retrieved documents when the texts are given as queries to web search engines. The similarities between the URLs contained in the items of the returned lists are measured using statistical methods like euclidean, city-block, chebychev, cosine, correlation, spearman and hamming distances. For experimenting, a corpus of 150 news is developed by gathering news in 50 different topics from 3 Turkish newspapers published during a certain time slot. News on the same topic published in different newspapers are considered as similar texts. Statistical methods are applied on the formed newsXterms matrix; and for each news similar news are ranked from the most similar to least similar. If at least one of the top two is the same with the ones marked manully as similar, it is counted as success. Experimental results show that cosines and correlation distances give the best performance with 84% precision. © 2013 IEEE.