Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification

Anemia decision tree K-Fold Cross Validation Percentage Split C4.5 Algorithm

Authors

December 26, 2025
February 14, 2026
February 27, 2026

Downloads

Anemia is a significant global health challenge characterized by a pathological deficit in hemoglobin concentration, often leading to physiological instability. Accurate clinical diagnosis typically relies on complete blood count (CBC) tests, which provide critical hematological parameters for classification. While machine learning models have demonstrated high efficacy in diagnosing anemia, existing research often relies on static data partitioning strategies that may overlook evaluation reliability and performance stability. This study addresses this gap by shifting the focus from architectural benchmarking to validation robustness, specifically evaluating the C4.5 algorithm's performance across different data-splitting techniques. The research uses a dataset comprising 1,281 clinical records with 14 numerical features and 9 anemia-type labels. To assess stability, two distinct partitioning strategies were implemented: a static Percentage Split (ranging from 60:40 to 90:10) and iterative K-Fold Cross Validation (with K values of 3, 5, 7, 10, and 15). Experimental results demonstrate that the C4.5 algorithm achieved its peak performance with the 90:10 Percentage Split, achieving an average accuracy of 99.46%, precision of 98.32%, and recall of 99.28%. In comparison, the K-Fold (K=10) approach yielded a slightly lower but more stable accuracy of 99.19% with a significantly reduced standard deviation (±0.09), highlighting its reliability for clinical applications. While the high-ratio percentage split maximizes training exposure and predictive potential, the K-Fold method provides a more objective, generalizable benchmark by accounting for the entire data distribution. The study further identifies challenges in classifying minority classes, such as Leukemia with thrombocytopenia, due to inherent data scarcity. Ultimately, this research confirms that the C4.5 algorithm, when paired with an optimal partitioning protocol, remains a robust and highly interpretable solution for clinical anemia screening, outperforming several complex modern architectures

How to Cite

Rahmawati, N. P., Irwan Budiman, Muhammad Itqan Mazdadi, Andi Farmadi, & Friska Abadi. (2026). Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 8(1), 105-114. https://doi.org/10.35882/ijeeemi.v8i1.315

Similar Articles

1-10 of 99

You may also start an advanced similarity search for this article.