Text Similarity Analysis Using Ir Lists

dc.contributor.author Metin S.K.
dc.contributor.author Karaoglan B.
dc.contributor.author Kisla T.
dc.date.accessioned 2023-06-16T15:00:53Z
dc.date.available 2023-06-16T15:00:53Z
dc.date.issued 2013
dc.description 2013 21st Signal Processing and Communications Applications Conference, SIU 2013 -- 24 April 2013 through 26 April 2013 -- Haspolat -- 98109 en_US
dc.description.abstract Natural language processing can be seen as a signal processing problem when the characters, syllabi, words, punctuations in a text are considered as signals. In this article, we present a novel approach that detects text similarity in Turkish, based on the similarities of the lists of retrieved documents when the texts are given as queries to web search engines. The similarities between the URLs contained in the items of the returned lists are measured using statistical methods like euclidean, city-block, chebychev, cosine, correlation, spearman and hamming distances. For experimenting, a corpus of 150 news is developed by gathering news in 50 different topics from 3 Turkish newspapers published during a certain time slot. News on the same topic published in different newspapers are considered as similar texts. Statistical methods are applied on the formed newsXterms matrix; and for each news similar news are ranked from the most similar to least similar. If at least one of the top two is the same with the ones marked manully as similar, it is counted as success. Experimental results show that cosines and correlation distances give the best performance with 84% precision. © 2013 IEEE. en_US
dc.identifier.doi 10.1109/SIU.2013.6531310
dc.identifier.isbn 9.78E+12
dc.identifier.scopus 2-s2.0-84880880104
dc.identifier.uri https://doi.org/10.1109/SIU.2013.6531310
dc.identifier.uri https://hdl.handle.net/20.500.14365/3595
dc.language.iso tr en_US
dc.relation.ispartof 2013 21st Signal Processing and Communications Applications Conference, SIU 2013 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Signal information en_US
dc.subject Similarity methods en_US
dc.subject Statistical signal processing en_US
dc.subject Web based text similarity en_US
dc.subject Correlation distance en_US
dc.subject NAtural language processing en_US
dc.subject Retrieved documents en_US
dc.subject Signal information en_US
dc.subject Signal processing problems en_US
dc.subject Similarity methods en_US
dc.subject Statistical signal processing en_US
dc.subject Text similarity en_US
dc.subject Hamming distance en_US
dc.subject Natural language processing systems en_US
dc.subject Newsprint en_US
dc.subject Search engines en_US
dc.subject Statistical methods en_US
dc.subject Signal processing en_US
dc.title Text Similarity Analysis Using Ir Lists en_US
dc.title.alternative Bgg Listeleri ile Metin Benzerlik Analizi en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 24471923700
gdc.author.scopusid 24314851200
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.departmenttemp Metin, S.K., Yazilim Mühendisligi Bölümü, IZmir Ekonomi Üniversitesi, Izmir, Turkey; Karaoglan, B., Uluslararasi Bilgisayar Enstitüsü, Ege Üniversitesi, Izmir, Turkey; Kisla, T., Bilgisayar Ve Ögretim Teknolojileri Egitimi Bölümü, Ege Üniversitesi, Izmir, Turkey en_US
gdc.description.endpage 4
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 1
gdc.description.wosquality N/A
gdc.identifier.openalex W2162853177
gdc.identifier.wos WOS:000325005300151
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.5349236E-9
gdc.oaire.isgreen false
gdc.oaire.keywords statistical signal processing
gdc.oaire.keywords web based text similarity
gdc.oaire.keywords similarity methods
gdc.oaire.keywords signal information
gdc.oaire.popularity 6.0129673E-10
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.4809
gdc.openalex.normalizedpercentile 0.77
gdc.opencitations.count 0
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Kumova Metin, Senem
gdc.wos.citedcount 0
relation.isAuthorOfPublication 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isAuthorOfPublication.latestForDiscovery 81d6fcea-c590-42aa-8443-7459c9eab7fa
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication 805c60d5-b806-4645-8214-dd40524c388f
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication.latestForDiscovery e9e77e3e-bc94-40a7-9b24-b807b2cd0319

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2683.pdf
Size:
344.62 KB
Format:
Adobe Portable Document Format