Klasifikasi Berita Bahasa Indonesia Dengan Menggunakan Metode K-Nearest Neighbor Dan Naive Bayes

Authors

  • Komariah Kukum Manieh Nuryasin Program Studi Teknik Informatika S-2, Universitas Pamulang
  • Taswanda Taryo Program Studi Teknik Informatika S-2, Universitas Pamulang
  • Sudarno Program Studi Teknik Informatika S-2, Universitas Pamulang

Keywords:

News Classification, K-Nearest Neighbor, Naïve Bayes, Text Mining, TF-IDF

Abstract

In the era of rapid development of information technology, the need for a news classification system is crucial to manage the increasing volume of information. This study aims to develop a news classification system in Indonesian into five main categories: Politics, Economy, Health, Security, and Poverty. The methods used include the K-Nearest Neighbor (KNN) algorithm and Naïve Bayes. The dataset consists of 2,000 news items obtained from Kaggle, with preprocessing stages including cleaning, tokenizing, normalization, and TF-IDF weighting. The evaluation was carried out through three data sharing scenarios: 70%-30%, 80%-20%, and 90%-10%. The results showed that the KNN algorithm achieved the highest accuracy of 89% in the 80%-20% and 90%-10% scenarios, while Naïve Bayes produced the best accuracy of 78.66% in the 70%-30% scenario. KNN proved to be more reliable for data with balanced category distribution, while Naïve Bayes required further adjustment, especially for underrepresented data categories. This research provides significant contributions to the development of an automatic news classification system, which can be implemented to improve user experience in accessing information.

References

[1] D. Julianti, “STRATEGI KEBIJAKAN PENGUATAN PELAYANAN DENGAN APLIKASI BERBASIS TEKNOLOGI INFORMASI,” vol. 2, 2024.

[2] N. I. Widiastuti, E. Rainarli, and K. E. Dewi, “Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen,” J. Infotel, vol. 9, no. 4, p. 416, 2017, doi: 10.20895/infotel.v9i4.312.

[3] R. Nanda, E. Haerani, S. K. Gusti, and S. Ramadhani, “Klasifikasi Berita Menggunakan Metode Support Vector Machine,” J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 2, pp. 269–278, 2022, doi: 10.32672/jnkti.v5i2.4193.

[4] Muhammad Rifki Bahrul Ulum, Basuki Rahmat, and Made Hanindia Prami Swari, “Implementasi Metode CNN Dan K-Nearest Neighbor Untuk Klasifikasi Tingkat Kematangan Tanaman Cabai Rawit,” Modem J. Inform. dan Sains Teknol., vol. 2, no. 3, pp. 112–123, 2024, doi: 10.62951/modem.v2i3.131.

[5] A. P. Wijaya and H. A. Santoso, “Naive Bayes Classification pada Klasifikasi Dokumen Untuk Identifikasi Konten E-Government Naïve Bayes Classification on Document Classification to Identify E-Government Content,” J. Appl. Intell. Syst., vol. 1, no. 1, pp. 48–55, 2016.

[6] I. P. Putri, “Analisis Performa Metode K- Nearest Neighbor (KNN) dan Crossvalidation pada Data Penyakit Cardiovascular,” Indones. J. Data Sci., vol. 2, no. 1, pp. 21–28, 2021, doi: 10.33096/ijodas.v2i1.25.

[7] R. N. Mauliza and Y. R. Sipayung, “Penerapan Text Mining Dalam Menganalisis Pendapat Masyarakat Terhadap Pemilu 2024 Pada Media Sosial X Menggunakan Metode Naive Bayes,” Technomedia J., vol. 9, no. 1, pp. 1–16, 2024, doi: 10.33050/tmj.v9i1.2212.

[8] Wartumi, R. Kurniawan, and A. Y. Wijaya, “Analisis Data Sentimen Ulasan Pengguna Aplikasi Shopee di Google Play Store dengan Klasifikasi Algoritma Naïve Bayes,” J. Inform. dan Rekayasa Perangkat Lunak, vol. 6, no. 1, pp. 164–170, 2024.

[9] R. Yunitarini et al., “KLASIFIKASI JAMU TRADISIONAL MADURA MENGGUNAKAN METODE K-NEAREST NEIGHBORS ( KNN ) DAN TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY ( TF-IDF ),” pp. 99–106, 2022.

[10] H. S. Anggraheni, M. J. Naufal, and N. Yudistira, “DETEKSI SPAM BERBAHASA INDONESIA BERBASIS TEKS MENGGUNAKAN MODEL BERT TEXT-BASED INDONESIAN SPAM DETECTION USING THE BERT MODEL,” vol. 11, no. 6, pp. 1291–1301, 2024, doi: 10.25126/jtiik.2024118121.

Downloads

Published

2025-07-31