Majors Determination for High School Students Using the Naïve Bayes Algorithm, C4.5 and the K-Nearest Neighbor Algorithm (Case Study: SMA 1 Barunawati Jakarta)


  • Yudisti Prayigo Permana
  • Taswanda Taryo
  • Makhsun Makhsun


Education is the most important function in life to form a good mindset and also to help develop the potential in students to become better individuals and the knowledge gained can be useful for many people. The majors process is the most important aspect in determining the interests and talents of students to facilitate students in carrying out learning. The majors must be done carefully and be seen from various aspects so that there is no mistake in determining the majors because it will have an impact on students' academic scores. In the majoring process, there are several aspects that are used as material for consideration, namely, by looking at the academic scores of students obtained from academic tests and then comparing them with the results of psychological tests and questionnaires regarding the majors of interest, so it takes quite a long time to get the results of majors. The difficulty in the process of classifying majors is an obstacle for the school to calculate from each criterion because there is no major system capable of producing majors classification with a high degree of accuracy so that the results obtained are in accordance with the abilities and interests of students. This study aims to get the best results from three algorithms, namely, Naïve Bayes, C4.5 and the K-Nearest Neighbor algorithm to determine the classification of majors in order to create more interesting, active learning because the learning that students get is in according to their interests and talents. The classification method using Naïve Bayes is a classification method based on probability which is used to predict with the assumption that between one class and another are not interdependent. In addition, the method using the C4.5 algorithm functions to classify data that has numeric and categorical attributes and the K-Nearest Neighbor algorithm works based on the assumption that a data will have the same class or category as the surrounding data. From the results of the tests carried out in this research of 214 datasets, the Naïve Bayes algorithm method has a better accuracy rate than the C4.5 and K-Nearest Neighbor algorithms from the amount of data processed resulting in an accuracy value of 98.13%. The comparisons have been made using data random data with real data of 50, 100, 214, 300, 400 and 428 data and it can be concluded that the nave Bayes algorithm is suitable to be applied in this case because it has the highest level of accuracy and is stable and not affected by the amount of data being tested.

Keywords: Data Mining, Classification, Major, Naïve Bayes, C4.5, K-Nearest Neighbor


