Text Similarity Analysis Using Ir Lists
| dc.contributor.author | Metin S.K. | |
| dc.contributor.author | Karaoglan B. | |
| dc.contributor.author | Kisla T. | |
| dc.date.accessioned | 2023-06-16T15:00:53Z | |
| dc.date.available | 2023-06-16T15:00:53Z | |
| dc.date.issued | 2013 | |
| dc.description | 2013 21st Signal Processing and Communications Applications Conference, SIU 2013 -- 24 April 2013 through 26 April 2013 -- Haspolat -- 98109 | en_US |
| dc.description.abstract | Natural language processing can be seen as a signal processing problem when the characters, syllabi, words, punctuations in a text are considered as signals. In this article, we present a novel approach that detects text similarity in Turkish, based on the similarities of the lists of retrieved documents when the texts are given as queries to web search engines. The similarities between the URLs contained in the items of the returned lists are measured using statistical methods like euclidean, city-block, chebychev, cosine, correlation, spearman and hamming distances. For experimenting, a corpus of 150 news is developed by gathering news in 50 different topics from 3 Turkish newspapers published during a certain time slot. News on the same topic published in different newspapers are considered as similar texts. Statistical methods are applied on the formed newsXterms matrix; and for each news similar news are ranked from the most similar to least similar. If at least one of the top two is the same with the ones marked manully as similar, it is counted as success. Experimental results show that cosines and correlation distances give the best performance with 84% precision. © 2013 IEEE. | en_US |
| dc.identifier.doi | 10.1109/SIU.2013.6531310 | |
| dc.identifier.isbn | 9.78E+12 | |
| dc.identifier.scopus | 2-s2.0-84880880104 | |
| dc.identifier.uri | https://doi.org/10.1109/SIU.2013.6531310 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14365/3595 | |
| dc.language.iso | tr | en_US |
| dc.relation.ispartof | 2013 21st Signal Processing and Communications Applications Conference, SIU 2013 | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Signal information | en_US |
| dc.subject | Similarity methods | en_US |
| dc.subject | Statistical signal processing | en_US |
| dc.subject | Web based text similarity | en_US |
| dc.subject | Correlation distance | en_US |
| dc.subject | NAtural language processing | en_US |
| dc.subject | Retrieved documents | en_US |
| dc.subject | Signal information | en_US |
| dc.subject | Signal processing problems | en_US |
| dc.subject | Similarity methods | en_US |
| dc.subject | Statistical signal processing | en_US |
| dc.subject | Text similarity | en_US |
| dc.subject | Hamming distance | en_US |
| dc.subject | Natural language processing systems | en_US |
| dc.subject | Newsprint | en_US |
| dc.subject | Search engines | en_US |
| dc.subject | Statistical methods | en_US |
| dc.subject | Signal processing | en_US |
| dc.title | Text Similarity Analysis Using Ir Lists | en_US |
| dc.title.alternative | Bgg Listeleri ile Metin Benzerlik Analizi | en_US |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication | |
| gdc.author.scopusid | 24471923700 | |
| gdc.author.scopusid | 24314851200 | |
| gdc.bip.impulseclass | C5 | |
| gdc.bip.influenceclass | C5 | |
| gdc.bip.popularityclass | C5 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::conference output | |
| gdc.collaboration.industrial | false | |
| gdc.description.departmenttemp | Metin, S.K., Yazilim Mühendisligi Bölümü, IZmir Ekonomi Üniversitesi, Izmir, Turkey; Karaoglan, B., Uluslararasi Bilgisayar Enstitüsü, Ege Üniversitesi, Izmir, Turkey; Kisla, T., Bilgisayar Ve Ögretim Teknolojileri Egitimi Bölümü, Ege Üniversitesi, Izmir, Turkey | en_US |
| gdc.description.endpage | 4 | |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | N/A | |
| gdc.description.startpage | 1 | |
| gdc.description.wosquality | N/A | |
| gdc.identifier.openalex | W2162853177 | |
| gdc.identifier.wos | WOS:000325005300151 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.oaire.diamondjournal | false | |
| gdc.oaire.impulse | 0.0 | |
| gdc.oaire.influence | 2.5349236E-9 | |
| gdc.oaire.isgreen | false | |
| gdc.oaire.keywords | statistical signal processing | |
| gdc.oaire.keywords | web based text similarity | |
| gdc.oaire.keywords | similarity methods | |
| gdc.oaire.keywords | signal information | |
| gdc.oaire.popularity | 6.0129673E-10 | |
| gdc.oaire.publicfunded | false | |
| gdc.oaire.sciencefields | 0202 electrical engineering, electronic engineering, information engineering | |
| gdc.oaire.sciencefields | 02 engineering and technology | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 0.4809 | |
| gdc.openalex.normalizedpercentile | 0.77 | |
| gdc.opencitations.count | 0 | |
| gdc.plumx.mendeley | 3 | |
| gdc.plumx.scopuscites | 0 | |
| gdc.scopus.citedcount | 0 | |
| gdc.virtual.author | Kumova Metin, Senem | |
| gdc.wos.citedcount | 0 | |
| relation.isAuthorOfPublication | 81d6fcea-c590-42aa-8443-7459c9eab7fa | |
| relation.isAuthorOfPublication.latestForDiscovery | 81d6fcea-c590-42aa-8443-7459c9eab7fa | |
| relation.isOrgUnitOfPublication | e9e77e3e-bc94-40a7-9b24-b807b2cd0319 | |
| relation.isOrgUnitOfPublication | 805c60d5-b806-4645-8214-dd40524c388f | |
| relation.isOrgUnitOfPublication | 26a7372c-1a5e-42d9-90b6-a3f7d14cad44 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | e9e77e3e-bc94-40a7-9b24-b807b2cd0319 |
Files
Original bundle
1 - 1 of 1
