Analisis Sentiment Tweets Berbahasa Sunda Menggunakan Naive Bayes Classifier dengan Seleksi Feature Chi Squared Statistic

Yono Cahyono, Saprudin Saprudin


At present the development of the use of social media in Indonesia is very rapid, in Indonesia there are a variety of regional languages, one of which is the Sundanese language, where some people especially those living in West Java use Sundanese language to express comments, opinions, suggestions, criticisms and others in social media. This information can be used as valuable data for individuals or organizations in decision making. The huge amount of data makes it impossible for humans to read and analyze it manually. Sentiment analysis is the process of classifying opinions, analyzing, understanding, evaluating, emotions and attitudes towards a particular entity such as individuals, organizations, products or services, topics, events, in order to obtain information. The purpose of this research is the Naїve Bayes Classifier (NBC) classification algorithm and Feature Chi Squared Statistics selection method can be used in Sundanese-language tweets sentiment analysis on Twitter social media into positive, negative and neutral categories. Chi Square Statistic feature test results can reduce irrelevant features in the Naïve Bayes Classifier classification process on Sundanese-language tweets with an accuracy of 78.48%.


Sentiment Analysis; Sundanese; Twitter; Naїve Bayes Classifier (NBC); Chi Squared Statistic


Berry, M.W. & Kogan, J. 2010. “Text Mining Aplication and theory”. WILEY : United Kingdom.

Chandani, V., & Wahono, R. S. (2015). “Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Film”. Journal of Intelligent Systems,1(1), 56-60.

Dehaff, M. 2010. “Sentiment Analysis, Hard But Worth It!”.

Feldman, R & Sanger, J. 2007. “The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Data”. Cambridge University Press : New York.

Ginting, H. S., Lhaksmana, K. M., & Murdiansyah, D. T. (2018). “Klasifikasi Sentimen Terhadap Bakal Calon Gubernur Jawa Barat 2018 Di Twitter Menggunakan Naive Bayes”. eProceedings of Engineering, 5(1).

Gorunescu, F. 2011. “Data Mining Concepts, Model and Techniques”. Berlin: Springer.

Jenkins, M. C. 2011. “How Sentiment Analysis works in machines”.

Lidya, S. K., Sitompul, O. S., & Efendi, S. (2015). “Sentiment Analysis Pada Teks Bahasa Indonesia Menggunakan Support Vector Machine (SVM) Dan K-Nearest Neighbor (K-NN). InSeminar Nasional Teknologi Informasi dan Komunikasi”.

Ling, J., Kencana, I. P. E. N., & Oka, T. B. (2014). “Analisis Sentimen Menggunakan Metode Naïve Bayes Classifier Dengan Seleksi Fitur Chi Square”. E-Jurnal Matematika, 3(3), 92-99.

Putranti, N. D., & Winarko, E. (2014). “Analisis sentimen twitter untuk teks berbahasa Indonesia dengan maximum entropy dan support vector machine”. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 8(1), 91-100.

Routray, P., Swain, C. K. & Mishra, S.P., 2013. “A Survey on Sentiment Analysis. International Journal of Computer Applications”, Agustus, 70(10), pp. 1-8

Saputra, N., Adji, T. B., & Permanasari, A. E. (2015). “Analisis sentimen data presiden Jokowi dengan preprocessing normalisasi dan stemming menggunakan metode naive bayes dan SVM”. Jurnal Dinamika Informatika, 5(1).

Wulandini, F. & Nugroho, A. N. 2009. “Text Classification Using Support Vector Machine for Webmining Based Spation Temporal Analysis of the Spread of Tropical Diseases”. International Conference on Rural Information and Communication Technology 2009.

Yang, Y., & Pedersen, J. O. 1997. “A comparative study on feature selection in text categorization”. ICML, (hal. 412--420).



  • There are currently no refbacks.

Copyright (c) 2019 Yono Cahyono

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Jurnal Informatika Universitas Pamulang (ISSN: 2541-1004 e-ISSN: 2622-4615)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License