Effectiveness of SMOTE in Enhancing Adult Autism Spectrum Disorder Diagnosis Predictive Performance With Missforest Imputation And Random Forest
Downloads
Autism Spectrum Disorder (ASD), originally described by Leo Kanner in 1943, is a complex developmental condition that manifests through social, emotional, and behavioral challenges, often including speech delays and difficulties in interpersonal interactions. Despite significant advancements in diagnostic criteria over the years, accurate diagnosis of ASD in adults remains challenging due to limited access to comprehensive datasets and inherent methodological constraints. The Autism Screening Adult dataset used in this study exemplifies these issues, as it contains missing values and exhibits a marked class imbalance, both of which can adversely affect model performance. To address these challenges, we proposed a framework that integrates Random Forest classification with MissForest imputation and the Synthetic Minority Over-sampling Technique (SMOTE). MissForest effectively imputes missing data by employing an iterative random forest approach that preserves the underlying structure of the data without relying on strict parametric assumptions. Meanwhile, SMOTE generates synthetic samples for the minority class, thereby balancing the dataset and reducing prediction bias. Experimental evaluation through 10-Fold Cross Validation demonstrated that the application of SMOTE significantly enhanced model performance. Notably, the overall accuracy improved from 70.17% to 79.32%, and the AUC-ROC increased from 47.13% to 85.84%, indicating a robust improvement in the model’s ability to distinguish between positive and negative cases. These results underscore the critical importance of addressing data imbalance and missing values in predictive modeling for ASD. The promising outcomes of this study provide a solid foundation for developing more reliable diagnostic tools for adult ASD, and future research may further refine feature selection and incorporate additional data sources to optimize performance even further.
Copyright (c) 2025 Muhammad Hafizh Musyaffa, Triando Hamonangan Saragih, Dodon Turianto Nugrahadi, Dwi Kartini, Andi Farmadi (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





