Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3369
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLang N.-
dc.contributor.authorZincir I.-
dc.contributor.authorZincir-Heywood N.-
dc.date.accessioned2023-06-16T14:57:57Z-
dc.date.available2023-06-16T14:57:57Z-
dc.date.issued2021-
dc.identifier.isbn9.78303E+12-
dc.identifier.issn2194-5357-
dc.identifier.urihttps://doi.org/10.1007/978-3-030-63128-4_52-
dc.identifier.urihttps://hdl.handle.net/20.500.14365/3369-
dc.descriptionFuture Technologies Conference, FTC 2020 -- 5 November 2020 through 6 November 2020 -- 251149en_US
dc.description.abstractIn many real-world applications, a high number of words could result in noisy and redundant information, which could degrade the general performance of text classification tasks. Feature selection techniques with the purpose of eliminating uninformative words have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. In this work, we propose an alternative feature selection strategy on binary representation data, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed strategy can achieve a theoretical approximation ratio of 12 by a naive greedy search. The proposed strategy is validated by empirical experiments on five publicly available datasets, namely, Cora, Citeseer, WebKB, SMS Spam and Spambase. Their effectiveness is shown for binary text classification tasks when compared with well-known filter feature selection methods and mutual information-based methods. © 2021, Springer Nature Switzerland AG.en_US
dc.description.sponsorshipNatural Sciences and Engineering Research Council of Canada, NSERCen_US
dc.description.sponsorshipAcknowledgment. This research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/.en_US
dc.description.sponsorshipThis research is partly supported by the Natural Science and Engineering Research Council of Canada (NSERC). This research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/.en_US
dc.language.isoenen_US
dc.publisherSpringer Science and Business Media Deutschland GmbHen_US
dc.relation.ispartofAdvances in Intelligent Systems and Computingen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectBinary representationen_US
dc.subjectFeature selectionen_US
dc.subjectText classificationen_US
dc.subjectClassification (of information)en_US
dc.subjectInformation theoryen_US
dc.subjectNP-harden_US
dc.subjectRedundancyen_US
dc.subjectText processingen_US
dc.subjectBinary representationsen_US
dc.subjectEmpirical experimentsen_US
dc.subjectFeature selection methodsen_US
dc.subjectInformation-theoretic approachen_US
dc.subjectMaximum relevance minimum redundanciesen_US
dc.subjectNear-optimal solutionsen_US
dc.subjectSelection techniquesen_US
dc.subjectTheoretical approximationsen_US
dc.subjectFeature extractionen_US
dc.titleBinary Text Representation for Feature Selectionen_US
dc.typeConference Objecten_US
dc.identifier.doi10.1007/978-3-030-63128-4_52-
dc.identifier.scopus2-s2.0-85096500961en_US
dc.authorscopusid57220004994-
dc.authorscopusid57105333100-
dc.identifier.volume1288en_US
dc.identifier.startpage681en_US
dc.identifier.endpage692en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityN/A-
dc.identifier.wosqualityN/A-
item.grantfulltextreserved-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.openairetypeConference Object-
item.fulltextWith Fulltext-
item.languageiso639-1en-
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Files in This Item:
File SizeFormat 
3369.pdf
  Restricted Access
606.24 kBAdobe PDFView/Open    Request a copy
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

1
checked on Oct 2, 2024

Page view(s)

46
checked on Sep 30, 2024

Download(s)

6
checked on Sep 30, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.