Comparative Analysis of Logistic Regression, SVM, Xgboost, and Random Forest Algorithms for Diabetes Classification


  • Rahmat Hidayat Budi Luhur University
  • Deni Mahdiana Budi Luhur University
  • Anggun Fergina Nusa Putra University



Diabetes is a disease that can attack anyone, where this disease occurs because there is excessive sugar content in the human body. Therefore, prevention of diabetes is necessary so that preventive measures can be given as early as possible. In this research, a classification process will be carried out using the Random Forest algorithm, Support Vector Classification and XGBoost. This research will use a dataset which consists of 768 total data with a distribution of non-diabetic data of 500 and a distribution of diabetes data of 268. For the classification results after testing, the results were that classification using random forest obtained a testing accuracy of 79.22%, with using support vector classification gets a testing accuracy of 76.62%, using XGBoost gets a testing accuracy of 79.22% using Logistic Regression gets a testing accuracy of 80.52%. The best classification value is obtained when using the Logistic Regression algorithm, namely with a precision of 79.00%, recall of 77.00% and F1-Score of 78.00%.


