Enhancing Software Defect Prediction: HHO-Based Wrapper Feature Selection with Ensemble Methods
Downloads
The growing complexity of data across domains highlights the need for effective classification models capable of addressing issues such as class imbalance and feature redundancy. The NASA MDP dataset poses such challenges due to its diverse characteristics and highly imbalanced classes, which can significantly affect model accuracy. This study proposes a robust classification framework integrating advanced preprocessing, optimization-based feature selection, and ensemble learning techniques to enhance predictive performance. The preprocessing phase involved z-score standardization and robust scaling to normalize data while reducing the impact of outliers. To address class imbalance, the ADASYN technique was employed. Feature selection was performed using Binary Harris Hawk Optimization (BHHO), with K-Nearest Neighbor (KNN) used as an evaluator to determine the most relevant features. Classification models including Random Forest (RF), Support Vector Machine (SVM), and Stacking were evaluated using performance metrics such as accuracy, AUC, precision, recall, and F1-measure. Experimental results indicated that the Stacking model achieved superior performance in several datasets, with the MC1 dataset yielding an accuracy of 0.998 and an AUC of 1.000. However, statistical significance testing revealed that not all observed improvements were meaningful; for example, Stacking significantly outperformed SVM but did not show a significant difference when compared to RF in terms of AUC. This underlines the importance of aligning model choice with dataset characteristics. In conclusion, the integration of advanced preprocessing and metaheuristic optimization contributes positively to software defect prediction. Future research should consider more diverse datasets, alternative optimization techniques, and explainable AI to further enhance model reliability and interpretability.
Copyright (c) 2025 Achmad Fauzan Luthfi, Rudy Herteno, Friska Abadi, Radityo Adi Nugroho, Muhammad Itqan Mazdadi, Vijay Anant Athavale (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





