Binary Text Representation for Feature Selection

Loading...
Publication Logo

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

In many real-world applications, a high number of words could result in noisy and redundant information, which could degrade the general performance of text classification tasks. Feature selection techniques with the purpose of eliminating uninformative words have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. In this work, we propose an alternative feature selection strategy on binary representation data, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed strategy can achieve a theoretical approximation ratio of 12 by a naive greedy search. The proposed strategy is validated by empirical experiments on five publicly available datasets, namely, Cora, Citeseer, WebKB, SMS Spam and Spambase. Their effectiveness is shown for binary text classification tasks when compared with well-known filter feature selection methods and mutual information-based methods. © 2021, Springer Nature Switzerland AG.

Description

Future Technologies Conference, FTC 2020 -- 5 November 2020 through 6 November 2020 -- 251149

Keywords

Binary representation, Feature selection, Text classification, Classification (of information), Information theory, NP-hard, Redundancy, Text processing, Binary representations, Empirical experiments, Feature selection methods, Information-theoretic approach, Maximum relevance minimum redundancies, Near-optimal solutions, Selection techniques, Theoretical approximations, Feature extraction

Fields of Science

Citation

WoS Q

N/A

Scopus Q

N/A
OpenCitations Logo
OpenCitations Citation Count
N/A

Source

Advances in Intelligent Systems and Computing

Volume

1288

Issue

Start Page

681

End Page

692
PlumX Metrics
Citations

Scopus : 1

Captures

Mendeley Readers : 1

SCOPUS™ Citations

1

checked on Mar 16, 2026

Page Views

1

checked on Mar 16, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.5495

Sustainable Development Goals

SDG data is not available