Analisis Sentiment Tweets Berbahasa Sunda Menggunakan Naive Bayes Classifier dengan Seleksi Feature Chi Squared Statistic

Authors

  • Yono Cahyono Universitas Pamulang
  • Saprudin Saprudin Universitas Pamulang

DOI:

https://doi.org/10.32493/informatika.v4i3.3186

Keywords:

Sentiment Analysis, Sundanese, Twitter, NaÑ—ve Bayes Classifier (NBC), Chi Squared Statistic

Abstract

At present the development of the use of social media in Indonesia is very rapid, in Indonesia there are a variety of regional languages, one of which is the Sundanese language, where some people especially those living in West Java use Sundanese language to express comments, opinions, suggestions, criticisms and others in social media. This information can be used as valuable data for individuals or organizations in decision making. The huge amount of data makes it impossible for humans to read and analyze it manually. Sentiment analysis is the process of classifying opinions, analyzing, understanding, evaluating, emotions and attitudes towards a particular entity such as individuals, organizations, products or services, topics, events, in order to obtain information. The purpose of this research is the Naїve Bayes Classifier (NBC) classification algorithm and Feature Chi Squared Statistics selection method can be used in Sundanese-language tweets sentiment analysis on Twitter social media into positive, negative and neutral categories. Chi Square Statistic feature test results can reduce irrelevant features in the Naïve Bayes Classifier classification process on Sundanese-language tweets with an accuracy of 78.48%.

References

Berry, M.W. & Kogan, J. 2010. “Text Mining Aplication and theoryâ€. WILEY : United Kingdom.

Chandani, V., & Wahono, R. S. (2015). “Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Filmâ€. Journal of Intelligent Systems,1(1), 56-60.

Dehaff, M. 2010. “Sentiment Analysis, Hard But Worth It!â€.

Feldman, R & Sanger, J. 2007. “The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Dataâ€. Cambridge University Press : New York.

Ginting, H. S., Lhaksmana, K. M., & Murdiansyah, D. T. (2018). “Klasifikasi Sentimen Terhadap Bakal Calon Gubernur Jawa Barat 2018 Di Twitter Menggunakan Naive Bayesâ€. eProceedings of Engineering, 5(1).

Gorunescu, F. 2011. “Data Mining Concepts, Model and Techniquesâ€. Berlin: Springer.

Jenkins, M. C. 2011. “How Sentiment Analysis works in machinesâ€.

Lidya, S. K., Sitompul, O. S., & Efendi, S. (2015). “Sentiment Analysis Pada Teks Bahasa Indonesia Menggunakan Support Vector Machine (SVM) Dan K-Nearest Neighbor (K-NN). InSeminar Nasional Teknologi Informasi dan Komunikasiâ€.

Ling, J., Kencana, I. P. E. N., & Oka, T. B. (2014). “Analisis Sentimen Menggunakan Metode Naïve Bayes Classifier Dengan Seleksi Fitur Chi Squareâ€. E-Jurnal Matematika, 3(3), 92-99.

Putranti, N. D., & Winarko, E. (2014). “Analisis sentimen twitter untuk teks berbahasa Indonesia dengan maximum entropy dan support vector machineâ€. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 8(1), 91-100.

Routray, P., Swain, C. K. & Mishra, S.P., 2013. “A Survey on Sentiment Analysis. International Journal of Computer Applicationsâ€, Agustus, 70(10), pp. 1-8

Saputra, N., Adji, T. B., & Permanasari, A. E. (2015). “Analisis sentimen data presiden Jokowi dengan preprocessing normalisasi dan stemming menggunakan metode naive bayes dan SVMâ€. Jurnal Dinamika Informatika, 5(1).

Wulandini, F. & Nugroho, A. N. 2009. “Text Classification Using Support Vector Machine for Webmining Based Spation Temporal Analysis of the Spread of Tropical Diseasesâ€. International Conference on Rural Information and Communication Technology 2009.

Yang, Y., & Pedersen, J. O. 1997. “A comparative study on feature selection in text categorizationâ€. ICML, (hal. 412--420).

Published

2019-09-30