Sentiment Analysis of the 2024 Presidential Candidates Using SMOTE and Long Short Term Memory
DOI:
https://doi.org/10.32493/informatika.v8i2.32210Keywords:
Sentiment, Twitter, SMOTE, LSTM, Word2vec, Presidential, 2024Abstract
Numerous political leaders participate in elections since they are a crucial component of the political process. Since electability is an issue, steps are taken to make political candidates running in general elections more electable. The media, including internet news media, has emerged as one of the key strategies for raising electability. Reader comments can be analyzed for sentiment to provide an evaluation of political figures. However, because the comments contain unstructured content, particularly in Indonesian text, it is difficult to interpret the sentiments of different comments in online news media. In this research, an analysis of public sentiment towards the 2024 presidential candidates will be carried out which is expressed through the Twitter social network. There are several stages to carry out sentiment analysis, including the stages of data collection, data preprocessing, balancing the distribution of the number of datasets, and sentiment classification using the LSTM method with word2vec feature representation. The results of this study show that the LSTM method combined with SMOTE due to the limited amount of data is able to produce a fairly good LSTM model with an average accuracy of 89.42% and a loss value of 0.24, the ideal scenario is when the accuracy is high and the loss is minimal, in which case the LSTM model only exhibits minor errors on a subset of the data.
References
A. J. Putri, A. S. Syafira, M. E. Purbaya, and D. Purnomo, “Analisis Sentimen E-Commerce Lazada pada Jejaring Sosial Twitter Menggunakan Algoritma Support Vector Machine,†Jurnal TRINISTIK: Jurnal Teknik Industri, Bisnis Digital, dan Teknik Logistik, vol. 1, no. 3, pp. 16–21, Mar. 2022, doi: 10.20895/trinistik.v1i1.447.
Alsaeedi, A., & Khan, M. Z. (2019). A study on sentiment analysis techniques of Twitter data. International Journal of Advanced Computer Science and Applications, 10(2), 361-374.
A. R. T. Lestari, R. S. Perdana dan M. A. Fauzi, “Analisis Sentimen Tentang Opini Pilkada Dki 2017 Pada Dokumen Twitter Berbahasa Indonesia Menggunakkan Naive Bayes dan Pembobotan Emoji,†Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 1, pp. 1718-1724, 2017
Badrika, A., Sulandari, S., & Astawa, I. W. (2022). IMPLEMENTASI PERATURAN KOMISI PEMILIHAN UMUM NOMOR 23 TAHUN 2018 TENTANG KAMPANYE PEMILIHAN UMUM TAHUN 2019 DI KABUPATEN GIANYAR. Jurnal Ilmiah Cakrawarti, 5(2), 80-89.
Camacho, L., Douzas, G., & Bacao, F. (2022). Geometric SMOTE for regression. Expert Systems with Applications, 193, 116387.
Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905.
Firmansyah, M. R., Ilyas, R., & Kasyidi, F. (2020, September). Klasifikasi Kalimat Ilmiah Menggunakan Recurrent Neural Network. In Prosiding Industrial Research Workshop and National Seminar (Vol. 11, No. 1, pp. 488-495).
G. Adam and P. Josh, Deep Learning: A Practitioner’s Approach. 2017.
Grohe, M. (2020, June). word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp. 1-16).
Herman, “Indonesia Masuk Lima Besar Pengguna Twitter,†03 05 2017. [Online]. Available: http://www.beritasatu.com/iptek/428591-indonesia-masuk-lima-besar-pengguna-twitter.html. [Diakses 2018 04 15]
Herremans, D., & Chuan, C. H. (2017). Modeling musical context with word2vec. arXiv preprint arXiv:1706.09088.
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146.
Ivanedra, K., & Mustikasari, M. (2019). Implementasi Metode Recurrent Neural Network Pada Text Summarization Dengan Teknik Abstraktif. J. Teknol. Inf. dan Ilmu Komput, 6(4), 377.
Ito, T., Tsubouchi, K., Sakaji, H., Yamashita, T., & Izumi, K. (2020). Contextual sentiment neural network for document sentiment analysis. Data Science and Engineering, 5, 180-192.
Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2vec model analysis for semantic similarities in english words. Procedia Computer Science, 157, 160-167.
Kurniawan, I., & Susanto, A. (2019). Implementasi Metode K-Means dan Naive Bayes Classifier untuk Analisis Sentimen Pemilihan Presiden (Pilpres) 2019. Jurnal Eksplora Informatika, 9(1), 1-10.
M. A. Nurrohmat and A. SN, “Sentiment Analysis of Novel Review Using Long Short-Term Memory Method,†IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 3, p. 209, 2019, doi: 10.22146/ijccs.41236
M. Bramer, “Principles of Data Mining. Undergraduate Topics in Computer Science,†Ch. 12: Estimating the Predictive Accuracy of a Classifier, Nov. 2013
M. Fachrurrozi dan N. Yusliani, “Analisis Sentimen Pengguna Jejaring Sosial Menggunakan Metode Support Vector Machine,†Konferensi Nasional Sistem Informasi, vol. 1, no. Konferensi Nasional Sistem Informasi, 2015.
Pan, T., Zhao, J., Wu, W., & Yang, J. (2020). Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 512, 1214-1233.
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.
Shutaywi, M., & Kachouie, N. N. (2021). Silhouette analysis for performance evaluation in machine learning with applications to clustering. Entropy, 23(6), 759.
Tannady, S. M. N., Setiabudi, D. H., & Tjondrowiguno, A. N. (2022). Penerapan Long-Short Term Memory dengan Word2Vec Model untuk Mendeteksi Hoax dan Clickbait News pada Berita Online di Indonesia. Jurnal Infra, 10(2), 28-34.
Widhiyasana, Y., Semiawan, T., Mudzakir, I. G. A., & Noor, M. R. (2021). Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia. Jurnal Nasional Teknik Elektro dan Teknologi Informasi| Vol, 10(4).
Y. Kim, “Convolutional neural networks for sentence classification,†EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1746–1751, 2014, doi: 10.3115/v1/d14-1181.
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent neural networks: LSTM cells and network architectures. Neural computation, 31(7), 1235-1270.
Zhang, Y., Tiwari, P., Song, D., Mao, X., Wang, P., Li, X., & Pandey, H. M. (2021). Learning interaction dynamics with an interactive LSTM for conversational sentiment analysis. Neural Networks, 133, 40-56.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms