Leveraging Ensemble Model and Optimized Feature Selection to Boost Prediction Accuracy in Educational Data Mining


연구 분야: Databases



학회: SN Computer Science


초록

In contemporary research, educational data mining (EDM) has become a captivating field for data mining and machine learning experts, focusing on identifying factors influencing students' academic performance and predicting the likelihood of students dropping out. To uncover these influential factors, feature selection methods are employed, while various machine learning models are used to predict students at risk of underperforming. Filter-based feature selection methods are commonly used in educational data mining due to their efficiency and ability to rank important features affecting academic success. However, because of their independence from classifiers and relying on a fixed threshold or predefined feature count, filter-based methods can sometimes negatively affect model performance. To address this, the present study introduces an optimized chi-square-based feature selection technique that dynamically selects the optimal features for each learning algorithm, ensuring that model performance is not compromised. The effectiveness of five classifiers—k-Nearest Neighbour (k-NN), Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR)—has been evaluated using three configurations: no feature selection, traditional chi-square feature selection, and proposed optimized chi-square based feature selection. These evaluations were conducted on two distinct student datasets, one from secondary schools (DS1) and another from engineering institutions (DS2). The results demonstrated that the optimized chi-square method consistently improved prediction accuracy across all classifiers. Additionally, a bagging-based ensemble classifier, constructed using the best-performing individual classifier, further enhanced predictive performance. The highest accuracies achieved were 94.62% for DS1 and 96.36% for DS2, outperforming traditional feature selection and ensemble methods. This study presents a scalable, reliable, and stable approach to student performance prediction, integrating optimized feature selection with ensemble learning.


Author Profile
Swati Verma

Department of Computer Science & Engineering Bipin Tripathi Kumaon Institute of Technology Dwarahat Uttarakhand India

India
Author Profile
Kuldeep Kholiya

Department of Applied Science Bipin Tripathi Kumaon Institute of Technology Dwarahat Uttarakhand India

India
Author Profile
Kanchan Bala

Department of Computer Science & Engineering Gaya College of Engineering DSTTE Gaya Bihar India

India

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 India
사이트 Springer
좋아요 수 0

연관 논문 목록 (95건)