Comparative Study of Filter, Wrapper, and Hybrid Feature Selection Using Tree-Based Classifiers for Software Defect Prediction

Rahmayanti Rahmayanti; Rudy  Herteno; Setyo Wahyu  Saputro; Triando Hamonangan  Saragih; Friska  Abadi

doi:10.35882/ijeeemi.v8i1.294

Authors

Rahmayanti Rahmayanti Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0009-0001-3414-9562
Rudy Herteno
rudy.herteno@ulm.ac.id
Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0003-0637-8090
Setyo Wahyu Saputro Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0009-0007-9250-7704
Triando Hamonangan Saragih Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0003-4346-3323
Friska Abadi Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0002-9449-8000

Vol. 8 No. 1 (2026): February

Electronics

Submitted November 26, 2025

Accepted December 10, 2025

Published December 27, 2025

Downloads

pdf

Abstract
How to Cite
Metrics
References
License

Software defect prediction (SDP) is essential for improving software reliability by enabling the early identification of modules that may contain defects before the release stage. SDP commonly exhibits redundant or non-contributory metrics, underscoring the need for feature selection to derive a more informative subset. To address this problem, the present study investigates and compares the effectiveness of three feature-selection strategies: SelectKBest (SKB), Recursive Feature Elimination (RFE), and the hybrid SKB+RFE, in enhancing the performance of tree-based classifiers on the NASA Metrics Data Program (MDP) data collections. The study utilizes three classification algorithms, namely Random Forest (RF), Extra Trees (ET), and Bagging (Decision Tree), with Area Under the Curve (AUC) serving as the primary metric for assessing model performance. Experimental results reveal that the RFE and Extra Trees combination yields the top performance, producing an average AUC of 0.7855. This is subsequently followed by the SKB+RFE+ET configuration, which achieves an AUC of 0.7809, and SKB+ET at 0.7776. These findings demonstrate that iterative wrapper-based approaches such as RFE can identify more relevant and effective feature subsets than filter or hybrid strategies, with the RFE+Extra Trees configuration yielding the strongest overall predictive performance and wrapper-based methods exhibiting higher stability across heterogeneous datasets. Even without hyperparameter tuning and relying solely on class-weighting rather than explicit resampling techniques, the findings offer empirical insight into the isolated influence of feature selection on predictive performance. Overall, the study confirms that RFE combined with Extra Trees offers the strongest predictive performance on NASA MDP data collections and forms a foundation for developing more adaptive and robust models.

[1] F. Matloob, S. Aftab, M. Ahmad, M. A. Khan, A. Fatima, M. Iqbal, W. M. Alruwaili, and N. S. Elmitwally, "Software Defect Prediction Using Supervised Machine Learning Techniques: A Systematic Literature Review", Intelligent Automation & Soft Computing, vol. 29, no. 2, pp. 403-421, Jun. 2021, doi: 10.32604/iasc.2021.017562.

[2] L. Q. Chen, C. Wang, and S. L. Song, "Software Defect Prediction Based on Nested-Stacking and Heterogeneous Feature Selection", Complex & Intelligent Systems, vol. 8, pp. 3333-3348, Feb. 2022, doi: 10.1007/s40747-022-00676-y.

[3] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach", Applied Sciences, vol. 9, no. 13, pp. 1-20, Jul. 2019, doi: 10.3390/app9132764.

[4] N. Krishnaveni, and V. Radha, "Feature Selection Algorithms for Data Mining Classification: A Survey", Indian Journal of Science and Technology, vol. 12, no. 6, pp. 1-11, Feb. 2019, doi: 10.17485/ijst/2018/v12i6/139581.

[5] N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction”, Front. Bioinform, vol. 2, Art. no. 927312, 2022, doi: 10.3389/fbinf.2022.927312.

[6] Y. Li, C. Y. Chen, and W. W. Wasserman, “Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters”, Conference Paper, pp. 1-13, 2015, doi: 10.13140/2.1.3673.6327.

[7] N. A. A. Khleel, and K. Nehez, “A New Approach to Software Defect Prediction Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory”, Production Systems and Information Engineering, vol. 10, no. 3, pp. 1-15, Nov. 2022, doi: 10.32968/psaie.2022.3.1.

[8] X. He, K. Zhao, and X. Chu, “AutoML: A Survey of the State-of-the-Art”, Knowledge-Based Systems, vol. 212, no. 106622, pp. 1-35, Jan. 2021, doi: 10.1016/j.knosys.2020.106622.

[9] T. Sharma, A. Jatain, S. Bhaskar, and K. Pabreja, "An Empirical Analysis of Feature Selection Techniques for Software Defect Prediction", Journal of Autonomous Intelligence, vol. 7, no. 3, pp. 1-17, 2024, doi: 10.32629/jai.v7i3.1097.

[10] J. Suntoro, F. W. Christanto, and H. Indriyawati, "Software Defect Prediction Using AWEIG+ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalance Problem", Int. Journal of Information Technology and Business, vol. 5, no. 1, pp. 27-32, Nov. 2022, doi: 10.24246/ijiteb.512018.27-32.

[11] A. O. Balogun, S. Basri, S. Mahamad, L. F. Capretz, A. A. Imam, M. A. Almomani, V. E. Adeyemo, and G. Kumar, "A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction", Computational Intelligence and Neuroscience, vo. 2021, no. 1, Nov. 2021, doi: 10.1155/2021/5069016.

[12] V. Maulida, R. Herteno, D. Kartini, F. Abadi, and M. R. Faisal, “Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction ”, j.electron.electromedical.eng.med.inform, vol. 5, no. 4, pp. 223-230, Aug. 2023, doi: 10.35882/jeeemi.v5i4.315.

[13] A. K. Aryanti, R. Herteno, F. Indriani, R. A. Nugroho, and M. Muliadi, “Implementation of Copeland Method on Wrapper-Based Feature Selection Using Random Forest For Software Defect Prediction”, ijeeemi, vol. 7, no. 1, pp. 90–101, Feb. 2025, doi: 10.35882/2pgffc67.

[14] A. S. Nugraha, M. R. Faisal, F. Abadi, R. A. Nugroho, and R. Herteno, “DEEP NEURAL NETWORK ON SOFTWARE DEFECT PREDICTION”, JDSSE, vol. 2, no. 02, pp. 82-89, Sep. 2021.

[15] F. Matloob, T. M. Ghazal, N. Taleb, S. Aftab, M. Ahmad, M. A. Khan, S. Abbas, and T. R. Soomro, "Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review", IEEE Access, pp. 98754-98771, Jul. 2021, doi: 10.1109/ACCESS.2021.3095559.

[16] H. Alsghaier, and M. Akour, "Software Fault Prediction Using Particle Swarm Algorithm with Genetic Algorithm and Support Vector Machine Classifier", Softw Pract Exp, vol. 50, no. 4, pp. 407-427, Jan. 2020, doi: 10.1002/spe.2784.

[17] S. Mcmurray, and A. H. Sodhro, "A Study on ML Based Software Defect Detection for Security Traceability in Smart Healthcare Applications", Sensors, vol. 23, no. 7, Apr. 2023, doi: 10.3390/s23073470.

[18] H. Alsawalqah, N. Hijazi, M. Eshtay, H. Faris, A. A. Radaideh, I. Aljarah, and Y. Alshamaileh, "Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns", Appl. Sci., vol. 10, no. 5, pp. 1-25, Mar. 2020, doi: 10.3390/app10051745.

[19] A. Ghavidel, P. Pazos, R. Del Aguila Suarez, and A. Atashi, “Predicting the Need for Cardiovascular Surgery: A Comparative Study of Machine Learning Models”, j.electron.electromedical.eng.med.inform, vol. 6, no. 2, pp. 92-106, Feb. 2024, doi: 10.35882/jeeemi.v6i2.359.

[20] D. P. H. Gray, “Software Defect Prediction Using Static Code Metrics: Formulating a Methodology”, Ph.D. dissertation, Univ. of Hertfordshire, Hatfield, UK, Dec. 2012. [Online]. Available: https://uhra.herts.ac.uk/id/eprint/16494/1/04079420%20Gray%20David%20final%20PhD%20submission.pdf

[21] K. Marzuki, L. G. Rady Putra, H. Hairani, L. Z. A. Mardedi, and J. X. Guterres, “Performance Improvement of The Random Forest Method Based on Smote-Tomek Link on Lombok Tourism Analysis Sentiment”, BITe, vol. 5, no. 2, pp. 151–158, Jan. 2024, doi: 10.30812/bite.v5i2.3166.

[22] H. Ghinaya, R. Herteno, M. R. Faisal, A. Farmadi, and F. Indriani, “Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest”, j.electron.electromedical.eng.med.inform, vol. 6, no. 3, pp. 276-288, May 2024, doi: 10.35882/jeeemi.v6i3.453.

[23] J. Kaliappan, A. R. Bagepalli, S. Almal, R. Mishra, Y.-C. Hu, and K. Srinivasan, “Impact of Cross-Validation on Machine Learning Models for Early Detection of Intrauterine Fetal Demise”, Diagnostics, vol. 13, no. 10, pp. 1-22, May 2023, doi: 10.3390/diagnostics13101692.

[24] X. Duan, "Automatic Identification of Conodont Species Using Fine-Grained Convolutional Neural Networks", Front. Earth Sci. vol. 10, pp. 1-15, Jan. 2023, doi: 10.3389/feart.2022.1046327.

[25] S. Szeghalmy, and A. Fazekas, "A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning", Sensors, vol. 23, no. 4, pp. 1-27, Feb. 2023, doi: 10.3390/s23042333.

[26] N. R. Abid-Althaqafi and H. A. Alsalamah, "The Effect of Feature Selection on the Accuracy of X-Platform User Credibility Detection with Supervised Machine Learning", Electronics, vol. 13, no. 1, pp. 1-28, Jan. 2024, doi: 10.3390/electronics13010205.

[27] J. R. Vergara and P. A. Estevez, “A Review of Feature Selection Methods Based on Mutual Information”, Neural Comput & Applic, vol. 24, no. 1, pp. 1-12, Jan. 2014, doi: 10.1007/s00521-013-1368-0.

[28] N. Papaioannou, G. Myllis, A. Tsimpiris, and V. Vrana, “The Role of Mutual Information Estimator Choice in Feature Selection: An Empirical Study on mRMR", Information, vol. 16, no. 9, pp. 1-25, Aug. 2025, doi: 10.3390/info16090724.

[29] O. Bulut, B. Tan, E. Mazzullo, and A. Syed, “Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare", Information, vol. 16, no. 4, pp. 1-21, Jun. 2025, doi: 10.3390/info16060476.

[30] T. A. Pham and V. Q. Tran, “Developing Random Forest Hybridization Models for Estimating the Axial Bearing Capacity of Pile”, PLoS ONE, vol. 17, no. 3, Mar, 2022, doi: 10.1371/journal.pone.0265747.

[31] F. Tang, and H. Ishwaran, “Random Forest Missing Data Algorithms”, Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 10, no. 6, pp. 363-377, Jun. 2017. doi: 10.1002/sam.11348.

[32] R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah”, E-Bisnis, vol. 13, no. 2, pp. 67–75, Nov. 2020, doi: 10.51903/e-bisnis.v13i2.247.

[33] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely Randomized Trees”, Machine Learning, vol. 63, no. 1, pp. 3-42, Mar. 2006, doi: 10.1007/s10994-006-6226-1.

[34] W. N. Hidayatullah, R. Herteno, M. R. Faisal, R. A. Nugroho, S. W. Saputro, and Z. B. Akhtar, “A Comparative Analysis of Polynomial-fit-SMOTE Variations with Tree-Based Classifiers on Software Defect Prediction”, j.electron.electromedical.eng.med.inform, vol. 6, no. 3, pp. 289-301, Jul. 2024, doi: 10.35882/jeeemi.v6i3.455.

[35] Y. Lou, Y. Ye, Y. Yang, W. Zou, G. Wang, M. Strong, S. Upadhyaya, and C. Payne, “Individualized empirical baselines for evaluating the energy performance of existing buildings”, Science and Technology for the Built Environment, vol. 29, no. 1, pp. 19-33, Oct. 2022, doi: 10.1080/23744731.2022.2134680.

[36] L. Breiman, “Bagging Predictors”, Machine Learning, vol. 24, no. 2, pp. 123-140, Aug. 1996, doi: 10.1007/BF00058655.

[37] X. Wu, and J. Wang, “Application of Bagging, Boosting and Stacking Ensemble and EasyEnsemble Methods for Landslide Susceptibility Mapping in the Three Gorges Reservoir Area of China”, Int. J. Environ. Res. Public Health, vol. 20, no. 6, pp. 1-18, Mar. 2023, doi: 10.3390/ijerph20064977.

[38] S. M. H. Kabir, M. T. Rahman, and A. H. Mridul, “Software Defect Prediction Using Traditional Machine Learning and Ensemble Learning Algorithms”, SWT, vol. 1, pp. 1-16, May. 2025, doi: 10.47852/bonviewSWT52025645.

[39] V. K. R. R. Satuluri and V. Kumar, “Precision Insulin Delivery: Predictive Modelling for Bolus Insulin Injection in Real-Time”, IJACSA, vol. 15, no. 2, pp. 292-302, Jan. 2024, doi: 10.14569/IJACSA.2024.0150231.

[40] A. D. Putri, F. Sholekhah, E. Dadynata, L. Efrizoni, R. Rahmaddeni, and N. Sapina, “Penerapan Algoritma Decision Tree C4.5 untuk Memprediksi Tingkat Kelangsungan Hidup Pasien Kanker Tiroid: The Application of C4.5 Decision Tree Algorithm for Predicting the Survival Rate of Thyroid Cancer Patients”, MALCOM, vol. 4, no. 4, pp. 1485-1495, Sep. 2024, doi: 10.57152/malcom.v4i4.1532.

[41] A. Alazba, and H. Aljamaan, “Software Defect Prediction Using Stacking Generalization of Optimized Tree-Based Ensembles”, Appl. Sci., vol. 12, no. 9, pp. 1-20, Apr. 2022. Doi 10.3390/app12094577.

[42] B. Zhou, H. Zhao, Y. Wen, G. Ding, Y. Xing, X. Lin, and L. Xiao, “Software Defect Prediction Based on Semantic Views of Metrics: Clustering Analysis and Model Performance Analysis”, Comput. Mater. Contin., vol. 84, no. 3, pp. 5201–5221, Jul. 2025, doi: 10.32604/cmc.2025.065726.

[43] R. Herteno, M. R. Faisal, R. A. Nugroho, F. Abadi, and S. W. Saputro, “Agregasi Peringkat Berdasarkan Feature Filter Rangking Dalam Cross-Project Software Defects”, SINTECH Journal, vol. 8, no. 1, pp. 1–11, Apr. 2025, doi: 10.31598/sintechjournal.v8i1.1763.

How to Cite

Rahmayanti, R., Herteno, R. ., Saputro, S. W. ., Saragih, T. H. ., & Abadi, F. . (2025). Comparative Study of Filter, Wrapper, and Hybrid Feature Selection Using Tree-Based Classifiers for Software Defect Prediction. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 8(1), 1-16. https://doi.org/10.35882/ijeeemi.v8i1.294

Download Citation

Comparative Study of Filter, Wrapper, and Hybrid Feature Selection Using Tree-Based Classifiers for Software Defect Prediction

Authors

Downloads

How to Cite

Most read articles by the same author(s)

Login

Journal Metrics

About IJEEEMI

Article Template

Citedness & Repository

Statistics

Information

Editorial Pick

The Role of U-Net Segmentation for Enhancing Deep Learning-based Dental Caries Classification

Acute effects of methadone on neural oscillations: an EEG study of theta, alpha, beta power, and frontal alpha asymmetry in opioid rehabilitation patients

Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction

Address

Contact Info: