Binary Text Representation for Feature Selection

dc.contributor.author Lang N.
dc.contributor.author Zincir I.
dc.contributor.author Zincir-Heywood N.
dc.date.accessioned 2023-06-16T14:57:57Z
dc.date.available 2023-06-16T14:57:57Z
dc.date.issued 2021
dc.description Future Technologies Conference, FTC 2020 -- 5 November 2020 through 6 November 2020 -- 251149 en_US
dc.description.abstract In many real-world applications, a high number of words could result in noisy and redundant information, which could degrade the general performance of text classification tasks. Feature selection techniques with the purpose of eliminating uninformative words have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. In this work, we propose an alternative feature selection strategy on binary representation data, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed strategy can achieve a theoretical approximation ratio of 12 by a naive greedy search. The proposed strategy is validated by empirical experiments on five publicly available datasets, namely, Cora, Citeseer, WebKB, SMS Spam and Spambase. Their effectiveness is shown for binary text classification tasks when compared with well-known filter feature selection methods and mutual information-based methods. © 2021, Springer Nature Switzerland AG. en_US
dc.description.sponsorship Natural Sciences and Engineering Research Council of Canada, NSERC en_US
dc.description.sponsorship Acknowledgment. This research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/. en_US
dc.description.sponsorship This research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/. en_US
dc.identifier.doi 10.1007/978-3-030-63128-4_52
dc.identifier.isbn 9.78E+12
dc.identifier.issn 2194-5357
dc.identifier.scopus 2-s2.0-85096500961
dc.identifier.uri https://doi.org/10.1007/978-3-030-63128-4_52
dc.identifier.uri https://hdl.handle.net/20.500.14365/3369
dc.language.iso en en_US
dc.publisher Springer Science and Business Media Deutschland GmbH en_US
dc.relation.ispartof Advances in Intelligent Systems and Computing en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Binary representation en_US
dc.subject Feature selection en_US
dc.subject Text classification en_US
dc.subject Classification (of information) en_US
dc.subject Information theory en_US
dc.subject NP-hard en_US
dc.subject Redundancy en_US
dc.subject Text processing en_US
dc.subject Binary representations en_US
dc.subject Empirical experiments en_US
dc.subject Feature selection methods en_US
dc.subject Information-theoretic approach en_US
dc.subject Maximum relevance minimum redundancies en_US
dc.subject Near-optimal solutions en_US
dc.subject Selection techniques en_US
dc.subject Theoretical approximations en_US
dc.subject Feature extraction en_US
dc.title Binary Text Representation for Feature Selection en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 57220004994
gdc.author.scopusid 57105333100
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.departmenttemp Lang, N., Dalhousie University, Halifax, NS, Canada; Zincir, I., Izmir University of Economics, Izmir, Turkey; Zincir-Heywood, N., Dalhousie University, Halifax, NS, Canada en_US
gdc.description.endpage 692 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 681 en_US
gdc.description.volume 1288 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W3095042565
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.4895952E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 1.3503004E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 0.5495
gdc.openalex.normalizedpercentile 0.7
gdc.opencitations.count 0
gdc.plumx.mendeley 1
gdc.plumx.scopuscites 1
gdc.scopus.citedcount 1
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication.latestForDiscovery e9e77e3e-bc94-40a7-9b24-b807b2cd0319

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
3369.pdf
Size:
606.24 KB
Format:
Adobe Portable Document Format