Comparative Analysis of Logistic Regression, SVM, Xgboost, and Random Forest Algorithms for Diabetes Classification

Authors

  • Rahmat Hidayat Budi Luhur University
  • Deni Mahdiana Budi Luhur University
  • Anggun Fergina Nusa Putra University

DOI:

https://doi.org/10.32493/jtsi.v7i1.38258

Abstract

Diabetes is a disease that can attack anyone, where this disease occurs because there is excessive sugar content in the human body. Therefore, prevention of diabetes is necessary so that preventive measures can be given as early as possible. In this research, a classification process will be carried out using the Random Forest algorithm, Support Vector Classification and XGBoost. This research will use a dataset which consists of 768 total data with a distribution of non-diabetic data of 500 and a distribution of diabetes data of 268. For the classification results after testing, the results were that classification using random forest obtained a testing accuracy of 79.22%, with using support vector classification gets a testing accuracy of 76.62%, using XGBoost gets a testing accuracy of 79.22% using Logistic Regression gets a testing accuracy of 80.52%. The best classification value is obtained when using the Logistic Regression algorithm, namely with a precision of 79.00%, recall of 77.00% and F1-Score of 78.00%.

References

Cahyani, Q. R., Finandi, M. J., Rianti, J., Arianti, D. L., Dwi, A., Putra, P., & Artikel, G. (2022). Prediksi Risiko Penyakit Diabetes menggunakan Algoritma Regresi Logistik Diabetes Risk Prediction using Logistic Regression Algorithm Article Info ABSTRAK. JOMLAI: Journal of Machine Learning and Artificial Intelligence, 1(2), 2828–9099. https://doi.org/10.55123/jomlai.v1i2.598

Cherif, I. L., & Kortebi, A. (2019). 2019 Wireless Days, WD 2019. IFIP Wireless Days, 2019-April, 1–6.

Clucas, G. V., Warwick-Evans, V., Hart, T., & Trathan, P. N. (2022). Using habitat models for chinstrap penguins, Pygoscelis antarctica, to inform marine spatial management around the South Sandwich Islands during the penguin breeding season. Deep-Sea Research Part II: Topical Studies in Oceanography, 199(March), 105093. https://doi.org/10.1016/j.dsr2.2022.105093

Dhita Diana Dewi, Nurul Qisthi, Siti Sarah Sobariah Lestari, Z. H. S. P. (2023). Perbandingan Metode Neural Network Dan Support Vector Machinedalam Klasifikasi Diagnosa Penyakit Diabetes. 3(September), 828–839. https://cerdika.publikasiindonesia.id/index.php/cerdika/article/view/662/866

Djedidi, O., Djeziri, M. A., Morati, N., Seguin, J. L., Bendahan, M., & Contaret, T. (2021). Accurate detection and discrimination of pollutant gases using a temperature modulated MOX sensor combined with feature extraction and support vector classification. Sensors and Actuators, B: Chemical, 339(March), 129817. https://doi.org/10.1016/j.snb.2021.129817

F Shahrabi Farahani, M Alavi, M Ghasem, Bt. (2020). Scientific Map of Papers Related to Data Mining in Civilica Database Based on Co-Word Analysis. International Journal of Web Research, 3(1), 11–18.

Fattorini, N., & Olmastroni, S. (2021). Pitfalls and advances in morphometric sexing: insights from the Adélie penguin Pygoscelis adeliae. Polar Biology, 44(8), 1563–1573. https://doi.org/10.1007/s00300-021-02893-6

Gray, O. (1996). Review Article. Caribbean Quarterly, 42(4), 70–74. https://doi.org/10.1080/00086495.1996.11672093

Hunafa, M. R., & Hermawan, A. (2023). KLIK: Kajian Ilmiah Informatika dan Komputer Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor Pada Imbalace Class Dataset Penyakit Diabetes. Media Online, 4(3), 1551–1561. https://doi.org/10.30865/klik.v4i3.1486

Kumar, K. V., & Ramamoorthy, M. (2022). Machine Learning-based spam detection using Naïve Bayes Classifier in comparison with Logistic Regression for improving accuracy. Journal of Pharmaceutical Negative Results, 13(SO4), 548–554. https://doi.org/10.47750/pnr.2022.13.s04.061

Liu, W., & Rao, Z. (2020). Road Icing Warning System Based on Support Vector Classification. IOP Conference Series: Earth and Environmental Science, 440(5). https://doi.org/10.1088/1755-1315/440/5/052071

Pelegrín, J. S., & Hospitaleche, C. A. (2022). Evolutionary and Biogeographical History of Penguins (Sphenisciformes): Review of the Dispersal Patterns and Adaptations in a Geologic and Paleoecological Context. Diversity, 14(4), 1–20. https://doi.org/10.3390/d14040255

Pratomo, A. H., Universitas Pembangunan Nasional “Veteran” Yogyakarta, Universitas Pendidikan Indonesia, Institute of Electrical and Electronics Engineers. Indonesia Section, & Institute of Electrical and Electronics Engineers. (n.d.). 2019 5th International Conference on Science in Information Technology (ICSITech) : proceeding : October 23-24, 2019, Yogyakarta, Indonesia.

Purnamasari, S. D., & Syakti, F. (2020). Implementasi Usability Testing dalam Evaluasi Website Sekolah. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 9(3), 420–426. https://doi.org/10.32736/sisfokom.v9i3.1000

Rahayu, D. S., Afifah, J., & Intan, S. (2023). Classification of Diabetes Mellitus Using C4 . 5 Algorithm , Support Vector Machine ( SVM ) and Linear Regression Klasifikasi Penyakit Diabetes Melitus Menggunakan Algoritma C4 . 5 , Support Vector Machine ( SVM ) dan Regresi Linear. SENTIMAS: Seminar Nasional Penelitian Dan Pengabdian Masyarakat, 1(1 SE-), 56–63. https://journal.irpi.or.id/index.php/sentimas/article/view/550

Rákos, O., Aradi, S., & Bécsi, T. (2020). Lane change prediction using Gaussian classification, support vector classification and neural network classifiers. Periodica Polytechnica Transportation Engineering, 48(4), 327–333. https://doi.org/10.3311/PPTR.15849

Rasna, & Matdoan, M. R. I. (2022). Metode Bayesian dan Multilayer Percepton dalam Mengklasifikasi Diabetes Mellitus. Jurnal Sistim Informasi Dan Teknologi, 4, 82–86. https://doi.org/10.37034/jsisfotek.v4i2.132

Rinanda, P. D., Delvika, B., Nurhidayarnis, S., Abror, N., & Hidayat, A. (2022). Perbandingan Klasifikasi Antara Naive Bayes dan K-Nearest Neighbor Terhadap Resiko Diabetes pada Ibu Hamil. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 2(2), 68–75. https://doi.org/10.57152/malcom.v2i2.432

Robles-Velasco, A., Cortés, P., Muñuzuri, J., & Onieva, L. (2020). Prediction of pipe failures in water supply networks using logistic regression and support vector classification. Reliability Engineering and System Safety, 196. https://doi.org/10.1016/j.ress.2019.106754

Samsudin, N. M., Mohd Foozy, C. F. B., Alias, N., Shamala, P., Othman, N. F., & Wan Din, W. I. S. (2019). Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1508–1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517

Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0

Soleh, M., Ammar, N., & Sukmadi, I. (2021). Website-Based Application for Classification of Diabetes Using Logistic Regression Method. Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), 9(1), 23. https://doi.org/10.24843/jim.2021.v09.i01.p03

Thaiyalnayaki, K. (2021). Classification of diabetes using deep learning and svm techniques. International Journal of Current Research and Review, 13(1), 146–149. https://doi.org/10.31782/IJCRR.2021.13127

Thongsuwan, S., Jaiyen, S., Padcharoen, A., & Agarwal, P. (2021). ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost. Nuclear Engineering and Technology, 53(2), 522–531. https://doi.org/10.1016/j.net.2020.04.008

Zhang, R., Li, B., & Jiao, B. (2019). Application of XGboost Algorithm in Bearing Fault Diagnosis. IOP Conference Series: Materials Science and Engineering, 490(7). https://doi.org/10.1088/1757-899X/490/7/072062

Downloads

Published

2024-01-30

How to Cite

Hidayat, R., Mahdiana, D., & Fergina, A. (2024). Comparative Analysis of Logistic Regression, SVM, Xgboost, and Random Forest Algorithms for Diabetes Classification. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(1), 281–291. https://doi.org/10.32493/jtsi.v7i1.38258