Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3595
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMetin S.K.-
dc.contributor.authorKaraoglan B.-
dc.contributor.authorKisla T.-
dc.date.accessioned2023-06-16T15:00:53Z-
dc.date.available2023-06-16T15:00:53Z-
dc.date.issued2013-
dc.identifier.isbn9.78147E+12-
dc.identifier.urihttps://doi.org/10.1109/SIU.2013.6531310-
dc.identifier.urihttps://hdl.handle.net/20.500.14365/3595-
dc.description2013 21st Signal Processing and Communications Applications Conference, SIU 2013 -- 24 April 2013 through 26 April 2013 -- Haspolat -- 98109en_US
dc.description.abstractNatural language processing can be seen as a signal processing problem when the characters, syllabi, words, punctuations in a text are considered as signals. In this article, we present a novel approach that detects text similarity in Turkish, based on the similarities of the lists of retrieved documents when the texts are given as queries to web search engines. The similarities between the URLs contained in the items of the returned lists are measured using statistical methods like euclidean, city-block, chebychev, cosine, correlation, spearman and hamming distances. For experimenting, a corpus of 150 news is developed by gathering news in 50 different topics from 3 Turkish newspapers published during a certain time slot. News on the same topic published in different newspapers are considered as similar texts. Statistical methods are applied on the formed newsXterms matrix; and for each news similar news are ranked from the most similar to least similar. If at least one of the top two is the same with the ones marked manully as similar, it is counted as success. Experimental results show that cosines and correlation distances give the best performance with 84% precision. © 2013 IEEE.en_US
dc.language.isotren_US
dc.relation.ispartof2013 21st Signal Processing and Communications Applications Conference, SIU 2013en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectSignal informationen_US
dc.subjectSimilarity methodsen_US
dc.subjectStatistical signal processingen_US
dc.subjectWeb based text similarityen_US
dc.subjectCorrelation distanceen_US
dc.subjectNAtural language processingen_US
dc.subjectRetrieved documentsen_US
dc.subjectSignal informationen_US
dc.subjectSignal processing problemsen_US
dc.subjectSimilarity methodsen_US
dc.subjectStatistical signal processingen_US
dc.subjectText similarityen_US
dc.subjectHamming distanceen_US
dc.subjectNatural language processing systemsen_US
dc.subjectNewsprinten_US
dc.subjectSearch enginesen_US
dc.subjectStatistical methodsen_US
dc.subjectSignal processingen_US
dc.titleText similarity analysis using IR listsen_US
dc.title.alternativeBGG listeleri ile metin benzerlik analizien_US
dc.typeConference Objecten_US
dc.identifier.doi10.1109/SIU.2013.6531310-
dc.identifier.scopus2-s2.0-84880880104en_US
dc.authorscopusid24471923700-
dc.authorscopusid24314851200-
dc.identifier.wosWOS:000325005300151en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityN/A-
dc.identifier.wosqualityN/A-
item.grantfulltextreserved-
item.openairetypeConference Object-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextWith Fulltext-
item.languageiso639-1tr-
item.cerifentitytypePublications-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Files in This Item:
File SizeFormat 
2683.pdf
  Restricted Access
344.62 kBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

Page view(s)

60
checked on Nov 18, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.