Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14365/3369
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lang N. | - |
dc.contributor.author | Zincir I. | - |
dc.contributor.author | Zincir-Heywood N. | - |
dc.date.accessioned | 2023-06-16T14:57:57Z | - |
dc.date.available | 2023-06-16T14:57:57Z | - |
dc.date.issued | 2021 | - |
dc.identifier.isbn | 9.78303E+12 | - |
dc.identifier.issn | 2194-5357 | - |
dc.identifier.uri | https://doi.org/10.1007/978-3-030-63128-4_52 | - |
dc.identifier.uri | https://hdl.handle.net/20.500.14365/3369 | - |
dc.description | Future Technologies Conference, FTC 2020 -- 5 November 2020 through 6 November 2020 -- 251149 | en_US |
dc.description.abstract | In many real-world applications, a high number of words could result in noisy and redundant information, which could degrade the general performance of text classification tasks. Feature selection techniques with the purpose of eliminating uninformative words have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. In this work, we propose an alternative feature selection strategy on binary representation data, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed strategy can achieve a theoretical approximation ratio of 12 by a naive greedy search. The proposed strategy is validated by empirical experiments on five publicly available datasets, namely, Cora, Citeseer, WebKB, SMS Spam and Spambase. Their effectiveness is shown for binary text classification tasks when compared with well-known filter feature selection methods and mutual information-based methods. © 2021, Springer Nature Switzerland AG. | en_US |
dc.description.sponsorship | Natural Sciences and Engineering Research Council of Canada, NSERC | en_US |
dc.description.sponsorship | Acknowledgment. This research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/. | en_US |
dc.description.sponsorship | This research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer Science and Business Media Deutschland GmbH | en_US |
dc.relation.ispartof | Advances in Intelligent Systems and Computing | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Binary representation | en_US |
dc.subject | Feature selection | en_US |
dc.subject | Text classification | en_US |
dc.subject | Classification (of information) | en_US |
dc.subject | Information theory | en_US |
dc.subject | NP-hard | en_US |
dc.subject | Redundancy | en_US |
dc.subject | Text processing | en_US |
dc.subject | Binary representations | en_US |
dc.subject | Empirical experiments | en_US |
dc.subject | Feature selection methods | en_US |
dc.subject | Information-theoretic approach | en_US |
dc.subject | Maximum relevance minimum redundancies | en_US |
dc.subject | Near-optimal solutions | en_US |
dc.subject | Selection techniques | en_US |
dc.subject | Theoretical approximations | en_US |
dc.subject | Feature extraction | en_US |
dc.title | Binary Text Representation for Feature Selection | en_US |
dc.type | Conference Object | en_US |
dc.identifier.doi | 10.1007/978-3-030-63128-4_52 | - |
dc.identifier.scopus | 2-s2.0-85096500961 | en_US |
dc.authorscopusid | 57220004994 | - |
dc.authorscopusid | 57105333100 | - |
dc.identifier.volume | 1288 | en_US |
dc.identifier.startpage | 681 | en_US |
dc.identifier.endpage | 692 | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.identifier.scopusquality | N/A | - |
dc.identifier.wosquality | N/A | - |
item.openairetype | Conference Object | - |
item.cerifentitytype | Publications | - |
item.grantfulltext | reserved | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.fulltext | With Fulltext | - |
item.languageiso639-1 | en | - |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection |
Files in This Item:
File | Size | Format | |
---|---|---|---|
3369.pdf Restricted Access | 606.24 kB | Adobe PDF | View/Open Request a copy |
CORE Recommender
SCOPUSTM
Citations
1
checked on Nov 27, 2024
Page view(s)
48
checked on Nov 25, 2024
Download(s)
6
checked on Nov 25, 2024
Google ScholarTM
Check
Altmetric
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.