Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification

Nanda Putri Rahmawati; Irwan Budiman; Muhammad Itqan Mazdadi; Andi Farmadi; Friska Abadi

doi:10.35882/ijeeemi.v8i1.315

Authors

Nanda Putri Rahmawati
nandaputri15406@gmail.com
Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0009-0005-6855-0900
Irwan Budiman Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0002-0514-7429
Muhammad Itqan Mazdadi Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0002-8710-4616
Andi Farmadi Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0009-0009-0926-8082
Friska Abadi Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia , Indonesia https://orcid.org/0000-0002-9449-8000

Vol. 8 No. 1 (2026): February

Medical Engineering

Submitted December 26, 2025

Accepted February 14, 2026

Published February 27, 2026

Downloads

pdf

Abstract
How to Cite
Author Biographies
Metrics
References
License

Anemia is a significant global health challenge characterized by a pathological deficit in hemoglobin concentration, often leading to physiological instability. Accurate clinical diagnosis typically relies on complete blood count (CBC) tests, which provide critical hematological parameters for classification. While machine learning models have demonstrated high efficacy in diagnosing anemia, existing research often relies on static data partitioning strategies that may overlook evaluation reliability and performance stability. This study addresses this gap by shifting the focus from architectural benchmarking to validation robustness, specifically evaluating the C4.5 algorithm's performance across different data-splitting techniques. The research uses a dataset comprising 1,281 clinical records with 14 numerical features and 9 anemia-type labels. To assess stability, two distinct partitioning strategies were implemented: a static Percentage Split (ranging from 60:40 to 90:10) and iterative K-Fold Cross Validation (with K values of 3, 5, 7, 10, and 15). Experimental results demonstrate that the C4.5 algorithm achieved its peak performance with the 90:10 Percentage Split, achieving an average accuracy of 99.46%, precision of 98.32%, and recall of 99.28%. In comparison, the K-Fold (K=10) approach yielded a slightly lower but more stable accuracy of 99.19% with a significantly reduced standard deviation (±0.09), highlighting its reliability for clinical applications. While the high-ratio percentage split maximizes training exposure and predictive potential, the K-Fold method provides a more objective, generalizable benchmark by accounting for the entire data distribution. The study further identifies challenges in classifying minority classes, such as Leukemia with thrombocytopenia, due to inherent data scarcity. Ultimately, this research confirms that the C4.5 algorithm, when paired with an optimal partitioning protocol, remains a robust and highly interpretable solution for clinical anemia screening, outperforming several complex modern architectures

Irwan Budiman, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

He is a lecturer at Lambung mangkurat University. He earned his Bachelor's Degree in Informatics Engineering from Islam Indonesia University, Yogyakarta. Subsequently, he completed his Master's studies in information systems at Diponegoro University, Semarang

Muhammad Itqan Mazdadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Muhammad Itqan Mazdadi, a lecturer in the Department of Computer Science, LambungMangkurat University. His research interest is centered on Data Science and Computer Networking. Before becoming a lecturer, he completed his undergraduate program in the Computer Science Department at Lambung Mangkurat University In 2013. He then completed his master’s degree from Department of Informatics at Islamic Indonesia University, Yogyakarta.

Andi Farmadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Andi Farmadi, a senior lecturer in the Computer Science program at Lambung MangkuratUniversity. He has been teaching since 2008 and has beenthe Head of the Data Science Lab since 2018. He completed his undergraduate studies at Hasanuddin University and his graduate studies at Bandung Institute of Technology.

Friska Abadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Friska Abadi finished his bachelor's degree in Computer Science from Lambung MangkuratUniversity in 2011. Subsequently, in 2016, he obtained his master's degree from the Department of Informatics at STMIK Amikom, Yogyakarta. Following that, he joined Lambung Mangkurat University as a lecturer in Computer Science. As a lecturer he teaches programming

[1] Z. Faradila, A. Homaidi, and J. D. Prasetyo, “Classification of Anaemia Status Using The K-Nearest Neighbor Algorithm,” G-Tech: Jurnal Teknologi Terapan, vol. 9, no. 1, pp. 436–444, Jan. 2025, doi: 10.70609/gtech.v9i1.6377.

[2] M. N. Garcia-Casal, O. Dary, M. E. Jefferds, and S. R. Pasricha, “Diagnosing anemia: Challenges selecting methods, addressing underlying causes, and implementing actions at the public health level,” Jun. 01, 2023, John Wiley and Sons Inc. doi: 10.1111/nyas.14996.

[3] L. Del Castillo et al., “Prevalence and risk factors of anemia in the mother–child population from a region of the Colombian Caribbean,” BMC Public Health, vol. 23, no. 1, Dec. 2023, doi: 10.1186/s12889-023-16475-0.

[4] J. G. Gómez, C. Parra Urueta, D. S. Álvarez, V. Hernández Riaño, and G. Ramirez-Gonzalez, “Anemia Classification System Using Machine Learning,” Informatics, vol. 12, no. 1, Mar. 2025, doi: 10.3390/informatics12010019.

[5] M. Mert Usta, M. Çakmak, and D. Ekmekçi, “Anemia Types Prediction Using Ensemble Learning.” [Online]. Available: https://www.icensos.com/

[6] R. Vohra, A. Hussain, A. K. Dudyala, J. Pahareeya, and W. Khan, “Multi-class classification algorithms for the diagnosis of anemia in an outpatient clinical setting,” PLoS One, vol. 17, no. 7 July, Jul. 2022, doi: 10.1371/journal.pone.0269685.

[7] G. Airlangga, “Anemia Classification Using Hybrid Machine Learning Models: A Comparative Study of Ensemble Techniques on CBC Data,” Journal of Computer System and Informatics (JoSYC), vol. 5, no. 4, pp. 1108–1117, Aug. 2024, doi: 10.47065/josyc.v5i4.5848.

[8] M. K. Hirok, S. Rahman, and M. Parvin, “Anemia prediction and classification of all classes with and without anemia patients using a machine learning model,” in 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS), IEEE, Sep. 2024, pp. 1–6. doi: 10.1109/COMPAS60761.2024.10796730.

[9] S. J. M. Mohammed, A. A. Ahmed, A. A. Ahmad, and M. S. Mohammed, “Anemia Prediction Based on Rule Classification,” in Proceedings - International Conference on Developments in eSystems Engineering, DeSE, Institute of Electrical and Electronics Engineers Inc., Dec. 2020, pp. 427–431. doi: 10.1109/DeSE51703.2020.9450234.

[10] A. Végh, L. Takáč, O. Czakóová, K. Dansca, and D. Nagy, “Evaluating Optimizable Machine Learning Models for Anemia Type Prediction from Complete Blood Count Data,” International Journal of Advanced Natural Sciences and Engineering Researches, vol. 7, no. 7, pp. 108–119, 2024, [Online]. Available: https://as-proceeding.com/index.php/ijanser

[11] D. C. E. Saputra, K. Sunat, and T. Ratnaningsih, “A New Artificial Intelligence Approach Using Extreme Learning Machine as the Potentially Effective Model to Predict and Analyze the Diagnosis of Anemia,” Healthcare (Switzerland), vol. 11, no. 5, Mar. 2023, doi: 10.3390/healthcare11050697.

[12] O. O. Okundalaye, N. Özdemir, and F. Evirgen, “Leveraging Machine Learning for Early and Accurate Anaemia Diagnosis: A Comparative Study of Classification Algorithms,” in Advances in Mathematical Modelling, Applied Analysis and Computation, J. Singh, G. A. Anastassiou, D. Baleanu, and D. Kumar, Eds., Cham: Springer Nature Switzerland, 2025, pp. 42–52.

[13] Y. Cakmak and I. Pacal, “AI-Driven Classification of Anemia and Blood Disorders Using Machine Learning Models,” Computers and Electronics in Medicine, vol. 2, no. 2, pp. 43–52, Jul. 2025, doi: 10.69882/adba.cem.2025073.

[14] Y. Zhang, Y. Xin, and Q. Li, “Research on parameter selection and optimization of C4.5 algorithm based on algorithm applicability knowledge base,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-11901-2.

[15] M. Teke, T. Etem, and M. Karhan, “Enhancing anemia diagnosis using ensemble machine learning and feature selection techniques on CBC data,” European Physical Journal: Special Topics, Oct. 2025, doi: 10.1140/epjs/s11734-025-01838-y.

[16] M. Bhagat and B. Bakariya, “Implementation of Logistic Regression on Diabetic Dataset using Train-Test-Split, K-Fold and Stratified K-Fold Approach,” National Academy Science Letters, vol. 45, no. 5, pp. 401–404, Oct. 2022, doi: 10.1007/s40009-022-01131-9.

[17] M. Rhifky Wayahdi, D. Syahputra, S. Hafiz, and N. Ginting, “EVALUATION OF THE K-NEAREST NEIGHBOR MODEL WITH K-FOLD CROSS VALIDATION ON IMAGE CLASSIFICATION,” JURNAL INFOKUM, vol. 9, no. 1, pp. 1–6, Dec. 2020, [Online]. Available: http://infor.seaninstitute.org/index.php/infokum/index

[18] I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05.

[19] K. Jung, D. H. Bae, M. J. Um, S. Kim, S. Jeon, and D. Park, “Evaluation of nitrate load estimations using neural networks and canonical correlation analysis with K-fold cross-validation,” Sustainability (Switzerland), vol. 12, no. 1, 2020, doi: 10.3390/SU12010400.

[20] I. O. Muraina, “IDEAL DATASET SPLITTING RATIOS IN MACHINE LEARNING ALGORITHMS: GENERAL CONCERNS FOR DATA SCIENTISTS AND DATA ANALYSTS.” [Online]. Available: www.artuklukongresi.org

[21] B. Vrigazova, “The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems,” Business Systems Research, vol. 12, no. 1, pp. 228–242, May 2021, doi: 10.2478/bsrj-2021-0015.

[22] J. Tan, J. Yang, S. Wu, G. Chen, and J. Zhao, “A critical look at the current train/test split in machine learning,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2106.04525

[23] A. Z. Abdullah, B. Winarno, and D. R. S. Saputro, “The decision tree classification with C4.5 and C5.0 algorithm based on R to detect case fatality rate of dengue hemorrhagic fever in Indonesia,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Feb. 2021. doi: 10.1088/1742-6596/1776/1/012040.

[24] Sumiati, V. V. R. Repi, P. Hendriyati, Anharudin, A. Yusta, and A. Triayudi, “Classification of cardiac disorders based on electrocardiogram data using a decision tree classification approach with the C45 algorithm,” IAES International Journal of Artificial Intelligence, vol. 12, no. 3, pp. 1128–1138, Sep. 2023, doi: 10.11591/ijai.v12.i3.pp1128-1138.

[25] M. M. Mijwil and R. A. Abttan, “Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm,” 2021. [Online]. Available: www.ajouronline.com

[26] M. Yunus, M. K. Biddinika, and A. Fadlil, “Classification of Stunting in Children Using the C4.5 Algorithm,” Jurnal Online Informatika, vol. 8, no. 1, pp. 99–106, Jun. 2023, doi: 10.15575/join.v8i1.1062.

[27] A. Sharma, M. Grover, J. Malhotra, and S. Sharma, “Predicting Maternal Health Risk Using Machine Learning Models And Comparing The Performance Of Percentage Split And K-Fold Cross Validation,” 2024. [Online]. Available: www.ijnrd.org

[28] L. Pawar, J. Malhotra, A. Sharma, D. Arora, and D. Vaidya, “A Robust Machine Learning Predictive Model for Maternal Health Risk,” in 3rd International Conference on Electronics and Sustainable Communication Systems, ICESC 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 882–888. doi: 10.1109/ICESC54411.2022.9885515.

[29] P. Verma and V. Chopra, “A Review on Machine Learning Algorithms for Anemia disease Prediction,” 2022.

[30] S. S. Abdul-Jabbar, A. K. Farhan, and A. S. Luchinin, “A Comparative Study of Anemia Classification Algorithms for International and Newly CBC Datasets,” International journal of online and biomedical engineering, vol. 19, no. 6, pp. 141–157, 2023, doi: 10.3991/ijoe.v19i06.38157.

[31] B. Çil, H. Ayyıldız, and T. Tuncer, “Discrimination of β-thalassemia and iron deficiency anemia through extreme learning machine and regularized extreme learning machine based decision support system,” Med. Hypotheses, vol. 138, p. 109611, 2020, doi: https://doi.org/10.1016/j.mehy.2020.109611.

[32] D. A. Tyas, S. Hartati, A. Harjoko, and T. Ratnaningsih, “Morphological, Texture, and Color Feature Analysis for Erythrocyte Classification in Thalassemia Cases,” IEEE Access, vol. 8, pp. 69849–69860, 2020, doi: 10.1109/ACCESS.2020.2983155.

[33] S. De and B. Chakraborty, “Case-Based Reasoning (CBR)-Based Anemia Severity Detection System (ASDS) Using Machine Learning Algorithm,” in Advanced Machine Learning Technologies and Applications, A. E. Hassanien, R. Bhatnagar, and A. Darwish, Eds., Singapore: Springer Singapore, 2021, pp. 621–632.

[34] Y. K. Fu et al., “The tvgh-nycu thal-classifier: Development of a machine-learning classifier for differentiating thalassemia and non-thalassemia patients,” Diagnostics, vol. 11, no. 9, Sep. 2021, doi: 10.3390/diagnostics11091725.

[35] P. Memmolo et al., “Differential diagnosis of hereditary anemias from a fraction of blood drop by digital holography and hierarchical machine learning,” Biosens. Bioelectron., vol. 201, p. 113945, 2022, doi: https://doi.org/10.1016/j.bios.2021.113945.

[36] B. E. Dejene, T. M. Abuhay, and D. S. Bogale, “Predicting the level of anemia among Ethiopian pregnant women using homogeneous ensemble machine learning algorithm,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, p. 247, 2022, doi: 10.1186/s12911-022-01992-6.

[37] Md. M. Islam et al., “Risk Factors Identification and Prediction of Anemia among Women in Bangladesh using Machine Learning Techniques,” Curr. Womens Health Rev., vol. 17, Feb. 2021, doi: 10.2174/1573404817666210215161108.

How to Cite

Rahmawati, N. P., Irwan Budiman, Muhammad Itqan Mazdadi, Andi Farmadi, & Friska Abadi. (2026). Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 8(1), 105-114. https://doi.org/10.35882/ijeeemi.v8i1.315

Download Citation

Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification

Authors

Downloads

Irwan Budiman, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Muhammad Itqan Mazdadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Andi Farmadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

Friska Abadi, Department of Computer Science, Faculty of Mathematics and Natural Science, Lambung Mangkurat University, Banjarbaru, Indonesia

How to Cite

Most read articles by the same author(s)

Similar Articles

Login

Journal Metrics

About IJEEEMI

Article Template

Citedness & Repository

Statistics

Information

Editorial Pick

The Role of U-Net Segmentation for Enhancing Deep Learning-based Dental Caries Classification

Acute effects of methadone on neural oscillations: an EEG study of theta, alpha, beta power, and frontal alpha asymmetry in opioid rehabilitation patients

Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction

Address

Contact Info: