Comparative Analysis of Logistic Regression, SVM, Xgboost, and Random Forest Algorithms for Diabetes Classification
DOI:
https://doi.org/10.32493/jtsi.v7i1.38258Abstract
Diabetes is a disease that can attack anyone, where this disease occurs because there is excessive sugar content in the human body. Therefore, prevention of diabetes is necessary so that preventive measures can be given as early as possible. In this research, a classification process will be carried out using the Random Forest algorithm, Support Vector Classification and XGBoost. This research will use a dataset which consists of 768 total data with a distribution of non-diabetic data of 500 and a distribution of diabetes data of 268. For the classification results after testing, the results were that classification using random forest obtained a testing accuracy of 79.22%, with using support vector classification gets a testing accuracy of 76.62%, using XGBoost gets a testing accuracy of 79.22% using Logistic Regression gets a testing accuracy of 80.52%. The best classification value is obtained when using the Logistic Regression algorithm, namely with a precision of 79.00%, recall of 77.00% and F1-Score of 78.00%.
References
Cahyani, Q. R., Finandi, M. J., Rianti, J., Arianti, D. L., Dwi, A., Putra, P., & Artikel, G. (2022). Prediksi Risiko Penyakit Diabetes menggunakan Algoritma Regresi Logistik Diabetes Risk Prediction using Logistic Regression Algorithm Article Info ABSTRAK. JOMLAI: Journal of Machine Learning and Artificial Intelligence, 1(2), 2828–9099. https://doi.org/10.55123/jomlai.v1i2.598
Cherif, I. L., & Kortebi, A. (2019). 2019 Wireless Days, WD 2019. IFIP Wireless Days, 2019-April, 1–6.
Clucas, G. V., Warwick-Evans, V., Hart, T., & Trathan, P. N. (2022). Using habitat models for chinstrap penguins, Pygoscelis antarctica, to inform marine spatial management around the South Sandwich Islands during the penguin breeding season. Deep-Sea Research Part II: Topical Studies in Oceanography, 199(March), 105093. https://doi.org/10.1016/j.dsr2.2022.105093
Dhita Diana Dewi, Nurul Qisthi, Siti Sarah Sobariah Lestari, Z. H. S. P. (2023). Perbandingan Metode Neural Network Dan Support Vector Machinedalam Klasifikasi Diagnosa Penyakit Diabetes. 3(September), 828–839. https://cerdika.publikasiindonesia.id/index.php/cerdika/article/view/662/866
Djedidi, O., Djeziri, M. A., Morati, N., Seguin, J. L., Bendahan, M., & Contaret, T. (2021). Accurate detection and discrimination of pollutant gases using a temperature modulated MOX sensor combined with feature extraction and support vector classification. Sensors and Actuators, B: Chemical, 339(March), 129817. https://doi.org/10.1016/j.snb.2021.129817
F Shahrabi Farahani, M Alavi, M Ghasem, Bt. (2020). Scientific Map of Papers Related to Data Mining in Civilica Database Based on Co-Word Analysis. International Journal of Web Research, 3(1), 11–18.
Fattorini, N., & Olmastroni, S. (2021). Pitfalls and advances in morphometric sexing: insights from the Adélie penguin Pygoscelis adeliae. Polar Biology, 44(8), 1563–1573. https://doi.org/10.1007/s00300-021-02893-6
Gray, O. (1996). Review Article. Caribbean Quarterly, 42(4), 70–74. https://doi.org/10.1080/00086495.1996.11672093
Hunafa, M. R., & Hermawan, A. (2023). KLIK: Kajian Ilmiah Informatika dan Komputer Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor Pada Imbalace Class Dataset Penyakit Diabetes. Media Online, 4(3), 1551–1561. https://doi.org/10.30865/klik.v4i3.1486
Kumar, K. V., & Ramamoorthy, M. (2022). Machine Learning-based spam detection using Naïve Bayes Classifier in comparison with Logistic Regression for improving accuracy. Journal of Pharmaceutical Negative Results, 13(SO4), 548–554. https://doi.org/10.47750/pnr.2022.13.s04.061
Liu, W., & Rao, Z. (2020). Road Icing Warning System Based on Support Vector Classification. IOP Conference Series: Earth and Environmental Science, 440(5). https://doi.org/10.1088/1755-1315/440/5/052071
Pelegrín, J. S., & Hospitaleche, C. A. (2022). Evolutionary and Biogeographical History of Penguins (Sphenisciformes): Review of the Dispersal Patterns and Adaptations in a Geologic and Paleoecological Context. Diversity, 14(4), 1–20. https://doi.org/10.3390/d14040255
Pratomo, A. H., Universitas Pembangunan Nasional “Veteran” Yogyakarta, Universitas Pendidikan Indonesia, Institute of Electrical and Electronics Engineers. Indonesia Section, & Institute of Electrical and Electronics Engineers. (n.d.). 2019 5th International Conference on Science in Information Technology (ICSITech) : proceeding : October 23-24, 2019, Yogyakarta, Indonesia.
Purnamasari, S. D., & Syakti, F. (2020). Implementasi Usability Testing dalam Evaluasi Website Sekolah. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 9(3), 420–426. https://doi.org/10.32736/sisfokom.v9i3.1000
Rahayu, D. S., Afifah, J., & Intan, S. (2023). Classification of Diabetes Mellitus Using C4 . 5 Algorithm , Support Vector Machine ( SVM ) and Linear Regression Klasifikasi Penyakit Diabetes Melitus Menggunakan Algoritma C4 . 5 , Support Vector Machine ( SVM ) dan Regresi Linear. SENTIMAS: Seminar Nasional Penelitian Dan Pengabdian Masyarakat, 1(1 SE-), 56–63. https://journal.irpi.or.id/index.php/sentimas/article/view/550
Rákos, O., Aradi, S., & Bécsi, T. (2020). Lane change prediction using Gaussian classification, support vector classification and neural network classifiers. Periodica Polytechnica Transportation Engineering, 48(4), 327–333. https://doi.org/10.3311/PPTR.15849
Rasna, & Matdoan, M. R. I. (2022). Metode Bayesian dan Multilayer Percepton dalam Mengklasifikasi Diabetes Mellitus. Jurnal Sistim Informasi Dan Teknologi, 4, 82–86. https://doi.org/10.37034/jsisfotek.v4i2.132
Rinanda, P. D., Delvika, B., Nurhidayarnis, S., Abror, N., & Hidayat, A. (2022). Perbandingan Klasifikasi Antara Naive Bayes dan K-Nearest Neighbor Terhadap Resiko Diabetes pada Ibu Hamil. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 2(2), 68–75. https://doi.org/10.57152/malcom.v2i2.432
Robles-Velasco, A., Cortés, P., Muñuzuri, J., & Onieva, L. (2020). Prediction of pipe failures in water supply networks using logistic regression and support vector classification. Reliability Engineering and System Safety, 196. https://doi.org/10.1016/j.ress.2019.106754
Samsudin, N. M., Mohd Foozy, C. F. B., Alias, N., Shamala, P., Othman, N. F., & Wan Din, W. I. S. (2019). Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1508–1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0
Soleh, M., Ammar, N., & Sukmadi, I. (2021). Website-Based Application for Classification of Diabetes Using Logistic Regression Method. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), 9(1), 23. https://doi.org/10.24843/jim.2021.v09.i01.p03
Thaiyalnayaki, K. (2021). Classification of diabetes using deep learning and svm techniques. International Journal of Current Research and Review, 13(1), 146–149. https://doi.org/10.31782/IJCRR.2021.13127
Thongsuwan, S., Jaiyen, S., Padcharoen, A., & Agarwal, P. (2021). ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost. Nuclear Engineering and Technology, 53(2), 522–531. https://doi.org/10.1016/j.net.2020.04.008
Zhang, R., Li, B., & Jiao, B. (2019). Application of XGboost Algorithm in Bearing Fault Diagnosis. IOP Conference Series: Materials Science and Engineering, 490(7). https://doi.org/10.1088/1757-899X/490/7/072062
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rahmat Hidayat, Deni Mahdiana, Anggun Fergina
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Teknologi Sistem Informasi dan Aplikasi have CC BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Teknologi Sistem Informasi dan Aplikasi recognize that free access is better than priced access, libre access is better than free access, and libre under CC BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License
YOU ARE FREE TO:
- Share - copy and redistribute the material in any medium or format
- Adapt - remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms