Sentiment Analysis of the 2024 Presidential Candidates Using SMOTE and Long Short Term Memory

Christian Sri Kusuma Aditya; Galih Wasis Wicaksono; Galih Wasis Wicaksono; Hilman Abi Sarwan Heryawan; Hilman Abi Sarwan Heryawan

doi:10.32493/informatika.v8i2.32210

Authors

Christian Sri Kusuma Aditya Universitas Muhammadiyah Malang http://orcid.org/0000-0001-8736-3397
Galih Wasis Wicaksono Universitas Muhammadiyah Malang
Galih Wasis Wicaksono Universitas Muhammadiyah Malang
Hilman Abi Sarwan Heryawan Universitas Muhammadiyah Malang
Hilman Abi Sarwan Heryawan Universitas Muhammadiyah Malang

DOI:

https://doi.org/10.32493/informatika.v8i2.32210

Keywords:

Sentiment, Twitter, SMOTE, LSTM, Word2vec, Presidential, 2024

Abstract

Numerous political leaders participate in elections since they are a crucial component of the political process. Since electability is an issue, steps are taken to make political candidates running in general elections more electable. The media, including internet news media, has emerged as one of the key strategies for raising electability. Reader comments can be analyzed for sentiment to provide an evaluation of political figures. However, because the comments contain unstructured content, particularly in Indonesian text, it is difficult to interpret the sentiments of different comments in online news media. In this research, an analysis of public sentiment towards the 2024 presidential candidates will be carried out which is expressed through the Twitter social network. There are several stages to carry out sentiment analysis, including the stages of data collection, data preprocessing, balancing the distribution of the number of datasets, and sentiment classification using the LSTM method with word2vec feature representation. The results of this study show that the LSTM method combined with SMOTE due to the limited amount of data is able to produce a fairly good LSTM model with an average accuracy of 89.42% and a loss value of 0.24, the ideal scenario is when the accuracy is high and the loss is minimal, in which case the LSTM model only exhibits minor errors on a subset of the data.

References

A. J. Putri, A. S. Syafira, M. E. Purbaya, and D. Purnomo, â€œAnalisis Sentimen E-Commerce Lazada pada Jejaring Sosial Twitter Menggunakan Algoritma Support Vector Machine,â€ Jurnal TRINISTIK: Jurnal Teknik Industri, Bisnis Digital, dan Teknik Logistik, vol. 1, no. 3, pp. 16â€“21, Mar. 2022, doi: 10.20895/trinistik.v1i1.447.

Alsaeedi, A., & Khan, M. Z. (2019). A study on sentiment analysis techniques of Twitter data. International Journal of Advanced Computer Science and Applications, 10(2), 361-374.

A. R. T. Lestari, R. S. Perdana dan M. A. Fauzi, â€œAnalisis Sentimen Tentang Opini Pilkada Dki 2017 Pada Dokumen Twitter Berbahasa Indonesia Menggunakkan Naive Bayes dan Pembobotan Emoji,â€ Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 1, pp. 1718-1724, 2017

Badrika, A., Sulandari, S., & Astawa, I. W. (2022). IMPLEMENTASI PERATURAN KOMISI PEMILIHAN UMUM NOMOR 23 TAHUN 2018 TENTANG KAMPANYE PEMILIHAN UMUM TAHUN 2019 DI KABUPATEN GIANYAR. Jurnal Ilmiah Cakrawarti, 5(2), 80-89.

Camacho, L., Douzas, G., & Bacao, F. (2022). Geometric SMOTE for regression. Expert Systems with Applications, 193, 116387.

Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.

FernÃ¡ndez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.

Firmansyah, M. R., Ilyas, R., & Kasyidi, F. (2020, September). Klasifikasi Kalimat Ilmiah Menggunakan Recurrent Neural Network. In Prosiding Industrial Research Workshop and National Seminar (Vol. 11, No. 1, pp. 488-495).

G. Adam and P. Josh, Deep Learning: A Practitionerâ€™s Approach. 2017.

Grohe, M. (2020, June). word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp. 1-16).

Herman, â€œIndonesia Masuk Lima Besar Pengguna Twitter,â€ 03 05 2017. [Online]. Available: http://www.beritasatu.com/iptek/428591-indonesia-masuk-lima-besar-pengguna-twitter.html. [Diakses 2018 04 15]

Herremans, D., & Chuan, C. H. (2017). Modeling musical context with word2vec. arXiv preprint arXiv:1706.09088.

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146.

Ivanedra, K., & Mustikasari, M. (2019). Implementasi Metode Recurrent Neural Network Pada Text Summarization Dengan Teknik Abstraktif. J. Teknol. Inf. dan Ilmu Komput, 6(4), 377.

Ito, T., Tsubouchi, K., Sakaji, H., Yamashita, T., & Izumi, K. (2020). Contextual sentiment neural network for document sentiment analysis. Data Science and Engineering, 5, 180-192.

Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2vec model analysis for semantic similarities in english words. Procedia Computer Science, 157, 160-167.

Kurniawan, I., & Susanto, A. (2019). Implementasi Metode K-Means dan Naive Bayes Classifier untuk Analisis Sentimen Pemilihan Presiden (Pilpres) 2019. Jurnal Eksplora Informatika, 9(1), 1-10.

M. A. Nurrohmat and A. SN, â€œSentiment Analysis of Novel Review Using Long Short-Term Memory Method,â€ IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 3, p. 209, 2019, doi: 10.22146/ijccs.41236

M. Bramer, â€œPrinciples of Data Mining. Undergraduate Topics in Computer Science,â€ Ch. 12: Estimating the Predictive Accuracy of a Classifier, Nov. 2013

M. Fachrurrozi dan N. Yusliani, â€œAnalisis Sentimen Pengguna Jejaring Sosial Menggunakan Metode Support Vector Machine,â€ Konferensi Nasional Sistem Informasi, vol. 1, no. Konferensi Nasional Sistem Informasi, 2015.

Pan, T., Zhao, J., Wu, W., & Yang, J. (2020). Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 512, 1214-1233.

Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.

Shutaywi, M., & Kachouie, N. N. (2021). Silhouette analysis for performance evaluation in machine learning with applications to clustering. Entropy, 23(6), 759.

Tannady, S. M. N., Setiabudi, D. H., & Tjondrowiguno, A. N. (2022). Penerapan Long-Short Term Memory dengan Word2Vec Model untuk Mendeteksi Hoax dan Clickbait News pada Berita Online di Indonesia. Jurnal Infra, 10(2), 28-34.

Widhiyasana, Y., Semiawan, T., Mudzakir, I. G. A., & Noor, M. R. (2021). Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia. Jurnal Nasional Teknik Elektro dan Teknologi Informasi| Vol, 10(4).

Y. Kim, â€œConvolutional neural networks for sentence classification,â€ EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1746â€“1751, 2014, doi: 10.3115/v1/d14-1181.

Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent neural networks: LSTM cells and network architectures. Neural computation, 31(7), 1235-1270.

Zhang, Y., Tiwari, P., Song, D., Mao, X., Wang, P., Li, X., & Pandey, H. M. (2021). Learning interaction dynamics with an interactive LSTM for conversational sentiment analysis. Neural Networks, 133, 40-56.