Ekstraksi Topik dalam Dataset Menggunakan Teknik Pemodelan Topik

Authors

  • Sajarwo Anggai Teknik Informatika, Program Pascasarjana, Universitas Pamulang, Tangerang Selatan, Banten
  • Tukiyat Teknik Informatika, Program Pascasarjana, Universitas Pamulang, Tangerang Selatan, Banten
  • Abu Khalid Rivai Teknik Informatika, Program Pascasarjana, Universitas Pamulang, Tangerang Selatan, Banten
  • Rafi Mahmud Zain Teknik Informatika, Program Pascasarjana, Universitas Pamulang, Tangerang Selatan, Banten

Keywords:

Dataset, Evaluation, Latent Dirichlet Allocation, Topic Modeling

Abstract

The issue in this research is the lack of understanding regarding the main topics and their changes in speeches and media publications related to President Joko Widodo. This study aims to identify, analyze, and predict changes in key topics within speeches, statements, and media publications related to President Joko Widodo using Latent Dirichlet Allocation (LDA) topic modeling techniques. The research employs a quantitative approach to analyze President Joko Widodo's speech texts using the Latent Dirichlet Allocation (LDA) method. The process began with scraping documents from the official website of the Republic of Indonesia's Secretariat, resulting in 5,988 speech transcripts from October 20, 2014, to March 2, 2024. Text preprocessing involved tokenization, stopword removal, and stemming/ lemmatization, followed by dictionary-term formation. The findings indicate that the model with k=16 has the highest coherence (0.554) and the best perplexity at k=21 (-13.130). The main topics identified include Nationalism and National Values, Regional Government, and Education and Children. Topic visualization with PyLDAvis aids in the exploration and identification of topics, providing insights for decision-making and policy development. To enhance understanding of topic changes, it is recommended to conduct trend analysis on key topics over time. This will help identify how President Joko Widodo's priorities shift and respond to new issues. By monitoring these trends, the research can provide deeper insights into the evolution of policies and the President's focus.

References

[1] Y. O. Santoso et al., “Pengelompokan jurnal ilmiah berdasarkan judul menggunakan lda 1,2,” Proxies, vol. 3, no. 1, pp. 32–42, 2019.

[2] P. A. Telnoni and E. Rosely, “Pelabelan Data Dengan Latent Dirichlet Allocation dan K-Means Clustering pada Data Twitter Menggunakan Bahasa Indonesia Data Labeling using Latent Dirichlet Allocation and K-Means Clustering on Indonesian-Based Twitter,” vol. 7, no. 2, pp. 885–892, 2020.

[3] I. M. Kusnanta, B. Putra, and P. Kusumawardani, “Analisis Topik Informasi Publik Media Sosial di Surabaya Menggunakan Pemodelan Latent Dirichlet Allocation ( LDA ),” vol. 6, no. 2, pp. 4–9, 2017.

[4] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” vol. 3, pp. 993–1022, 2003.

[5] Y. Matira and I. Setiawan, “Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation,” Estimasi J. Stat. Its Appl., vol. 4, no. 1, pp. 2721–379, 2023, doi: 10.20956/ejsa.vi.24843.

[6] A. O. Widodo, F. Septiadi, and N. A. Rakhmawati, “Analisis Tren Konten Pada Vtuber Indonesia Menggunakan Latent Dirichlet Allocation,” 2023. [Online]. Available: http://e-journal.stmiklombok.ac.id/index.php/jire

[7] A. Mulia and A. R. Dzikrillah, “Analisis Perbedaan Pendapat Netizen Indonesia tentang Presiden Jokowi sebelum dan sesudah Kenaikan Harga BBM Analysis of Indonesian Netizens’ Dissent on President Jokowi before and after Fuel Price Increase,” J. Comput. Eng. Syst. Sci., vol. 8, no. 2, pp. 318–328, 2023, [Online]. Available: www.jurnal.unimed.ac.id

[8] R. Mitchel, Web Scraping with Python, vol. 53. Sebastopol: O’Reilly Media, Inc, 2018.

[9] Y. S. Emma Haddi, Xiaohui Liu, “The Role of Text Pre-processing in Sentiment Analysis,” Procedia Comput. Sci., vol. Volume 17, p. Pages 26-32, 2013.

[10] A. T. J. H, “Preprocessing Text untuk Meminimalisir Kata yang Tidak Berarti dalam Proses Text Mining,” J. Inform. UPGRIS, vol. 1, pp. 1–9, 2015.

[11] G. Rosalinda, R. Santoso, and P. Kartikasari, “Pemodelan Topik Ulasan Aplikasi Netflix Pada Google Play Store Menggunakan Latent Dirichlet Allocation,” J. Gaussian, vol. 11, no. 4, pp. 554–561, Feb. 2023, doi: 10.14710/j.gauss.11.4.554-561.

Downloads

Published

2024-07-31