Model-Based Feature Selection Using Structural Equation Modeling for Enhanced Classification Performance in High-Dimensional Datasets

Loading...
Publication Logo

Date

2025

Authors

Albayrak, Muammer
Turhan, Kemal

Journal Title

Journal ISSN

Volume Title

Publisher

Gazi University

Open Access Color

GOLD

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Feature selection is becoming more and more important for machine learning and data mining. Especially for high dimensional datasets, it is necessary to filter out irrelevant and unnecessary features to overcome the problems of overfitting and multidimensionality. We hypothesized that an effective feature selection can be made with a model-based approach using the Structural Equation Modeling (SEM) method. The dataset consists of 2969 samples and 117 features. First, a measurement model created was tested with confirmatory factor analysis (CFA) and the number of features was reduced to 58 by removing the statistically insignificant features. In SEM analysis, sub-feature sets consisting of 55, 52, 41 and 35 features were obtained by removing the variables whose relationship was below the threshold values determined for the standardized regression coefficient (SRC). The obtained sub-feature sets were tested with a multilayer perceptron (MLP) and their effect on performance was examined. Results were compared with random forest feature importance as baseline method. SEM and random forest have generally performed very closely. While sub-feature sets created with the random forest in two-class classification produced better results, the sub-feature sets created with the suggested SEM-based method in three and five-class classification provided better performance. These results showed that effective feature selection can be made with the proposed model-based approach using SEM. With this approach, it is possible to obtain sub-feature sets that form a model which statistically significant and consistent with field knowledge by including expert knowledge in the modeling process.

Description

Keywords

Artificial Neural Networks, Feature Importance, Feature Selection, Structural Equation Modeling

Fields of Science

Citation

WoS Q

Q3

Scopus Q

Q3
OpenCitations Logo
OpenCitations Citation Count
N/A

Source

Gazi University Journal of Science

Volume

38

Issue

3

Start Page

1247

End Page

1260
PlumX Metrics
Citations

Scopus : 0

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.0

Sustainable Development Goals