How Stemming Process Ruining the Meaning of Indonesia Phrases and BRL Method as Its Solutions for Handling Complaint Text

Authors

  • Berlian Rahmy Lidiawaty Telkom University
  • Anita Hakim Nasution Telkom University
  • Adzanil Rachmadi Putra Telkom University
  • Rafi Andi Hidayah Telkom University
  • Hayu Faiz Naufal Asyrof Telkom University
  • Raihan Febrianto Grahadi Telkom University

DOI:

https://doi.org/10.32493/jtsi.v7i3.40316

Keywords:

Be raw language; skipping stemming in Bahasa Indonesia; Bahasa indoensia text preprocessing; Processing Indonesia complaint text

Abstract

The stemming process in text preprocessing can ruin the meaning of words in Bahasa Indonesia text mining, potentially influencing the interpretation outcomes or the accuracy of machine learning models when processing complaint texts. Many Indonesians deliver their complaints by text, making this an important issue. Therefore, this research proposes the Be Raw Language (BRL) method for handling complaint texts. BRL generally circumvents several words that, when subjected to stemming, undergo changes in meaning. To ascertain whether a word undergoes changes and as a basis for analysis, this study employs a sentiment analysis approach utilizing 6,205 complaint text data sourced from community reviews concerning tourist destinations. Initially, these reviews are labeled as ground truth, and sentiment calculations are conducted. In the preliminary stage, the research findings indicate an accuracy rate of 60.23%. Subsequently, this study conducts an in-depth analysis on how words in the Indonesian language may change in meaning with the addition of prefixes or suffixes. Consequently, the concept of the BRL method emerges to analyze words without employing stemming and to delineate its approach in interpreting words along with their meanings. The study establishes three main rules for interpreting the meaning of a word or even phrases in Bahasa Indonesia texts to enhance accuracy. As a result, employing the BRL method increases the accuracy rate by 17.57% to 77.80%.

References

Aditya, I. A., Haryadi, F. N., Haryani, I., Rachmawati, I., Ramadhani, D. P., Tantra, T., & Alamsyah, A. (2023). Understanding service quality concerns from public discourse in Indonesia state electric company. Heliyon, 9(8). https://doi.org/10.1016/j.heliyon.2023.e18768

Adriani, M., Asian, J., Nazief, B., & Tahaghoghi. (2005). Stemming Indonesian. Conferences in Research and Practice in Information Technology Series, 38, 307–314. https://doi.org/10.1145/1316457.1316459

Amalia, A., Sitompul, O., Erna, N., & Mantoro, T. (2020). An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification.

Catelli, R., Pelosi, S., Comito, C., Pizzuti, C., & Esposito, M. (2023). Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy. Computers in Biology and Medicine, 158. https://doi.org/10.1016/j.compbiomed.2023.106876

Chen, S., Zhang, Y., Song, B., Du, X., & Guizani, M. (2022). An Intelligent Government Complaint Prediction Approach. Big Data Research, 30, 100336. https://doi.org/https://doi.org/10.1016/j.bdr.2022.100336

Chung, S. (1976). An Object-Creating Rule in Bahasa Indonesia. Linguistic Inquiry, 7(1), 41–87. http://www.jstor.org/stable/4177912

Daneshfar, F., Soleymanbaigi, S., Nafisi, A., & Yamini, P. (2024). Elastic deep autoencoder for text embedding clustering by an improved graph regularization. Expert Systems with Applications, 238, 121780. https://doi.org/https://doi.org/10.1016/j.eswa.2023.121780

Ding, K., Gong, X. Y., Huang, T., & Choo, W. C. (2024). Recommend or not: A comparative analysis of customer reviews to uncover factors influencing explicit online recommendation behavior in peer-to-peer accommodation. European Research on Management and Business Economics, 30(1). https://doi.org/10.1016/j.iedeen.2023.100236

El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences, 34(10), 8422–8428. https://doi.org/10.1016/j.jksuci.2021.02.005

Ghasemaghaei, M., Eslami, S. P., Deal, K., & Hassanein, K. (2018). Reviews’ length and sentiment as correlates of online reviews’ ratings. Internet Research, 28(3), 544–563. https://doi.org/10.1108/IntR-12-2016-0394

Gozal, A. G., Pranoto, H., & Hasani, M. F. (2023). Sentiment analysis of the Indonesian community toward face-to-face learning during the Covid-19 pandemic. Procedia Computer Science, 227, 398–405. https://doi.org/10.1016/j.procs.2023.10.539

Gupta, M., Singh, A., Jain, R., Saxena, A., & Ahmed, S. (2021). Multi-class railway complaints categorization using Neural Networks: RailNeural. Journal of Rail Transport Planning & Management, 20, 100265. https://doi.org/https://doi.org/10.1016/j.jrtpm.2021.100265

Haiyudi, H., & Art-In, S. (2021). Challenges, Strategies, and Solutions of Teaching Bahasa Indonesia in Covid-19 Crises: Case in Khon Kaen University. Indonesian Journal on Learning and Advanced Education (IJOLAE), 3(2), 142–152. https://doi.org/10.23917/ijolae.v3i2.12369

Huang, M., Xie, H., Rao, Y., Liu, Y., Poon, L. K. M., & Wang, F. L. (2022). Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis. IEEE Transactions on Affective Computing, 13(3), 1337–1348. https://doi.org/10.1109/TAFFC.2020.2997769

Ibrahim, N. F., & Wang, X. (2019). A text analytics approach for online retailing service improvement: Evidence from Twitter. Decision Support Systems, 121, 37–50. https://doi.org/https://doi.org/10.1016/j.dss.2019.03.002

Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. https://www.komnasham.go.id/index.php/

Ibrohim, M. O., & Budi, I. (2023). Hate speech and abusive language detection in Indonesian social media: Progress and challenges. In Heliyon (Vol. 9, Issue 8). Elsevier Ltd. https://doi.org/10.1016/j.heliyon.2023.e18647

Kusumaningrum, R., Nisa, I. Z., Jayanto, R., Nawangsari, R. P., & Wibowo, A. (2023). Deep learning-based application for multilevel sentiment analysis of Indonesian hotel reviews. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e17147

Kusumawardani, R. P., & Maulidani, M. W. (2020). Aspect-level Sentiment Analysis for Social Media Data in the Political Domain using Hierarchical Attention and Position Embeddings. 2020 International Conference on Data Science and Its Applications (ICoDSA), 1–5. https://doi.org/10.1109/ICoDSA50139.2020.9212883

Kusumawardani, R. P., Priansya, S., & Atletiko, F. J. (2018). Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings. Procedia Computer Science, 144, 105–117. https://doi.org/10.1016/j.procs.2018.10.510

Lidiawaty, B. R., Nasution, A. H., Putra, A. R., & Tjahyanto, A. (2024). Design Science Research for Developing Risk Tourism Mapping Based on Visitor Sentiment Review. Procedia Computer Science, 234, 1672–1680. https://doi.org/https://doi.org/10.1016/j.procs.2024.03.172

Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023a). Analyzing Traffic-Complaint Tweets Based on Time-Location Context Analysis to Develop A Traffic Urgency Model. 2023 8th International Conference on Business and Industrial Research (ICBIR), 585–590. https://doi.org/10.1109/ICBIR57571.2023.10147706

Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023b). Design Science Research in Developing Traffic Urgency Model From Text for Determining Transportation Complaint Priority - an Initial Investigation. 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), 25–30. https://doi.org/10.1109/ISITIA59021.2023.10221083

Lidiawaty, B. R., Zulfaqor, M. E., Diyantara, O., & Dewi, D. R. S. (2022). Keywords Generator From Paragraph Text Using Text Mining in Bahasa Indonesia. 2022 Interdisciplinary Research in Technology and Management (IRTM), 1–4. https://doi.org/10.1109/IRTM54583.2022.9791753

Lutfiana, E. (2021). The Difficulties Faced by Foreigners in Learning Indonesian at Puri Bahasa Indonesia. JLA (Jurnal Lingua Applicata), 4(2), 89. https://doi.org/10.22146/jla.58394

Ma, B., & Zhuge, H. (2024). Automatic construction of classification dimensions by clustering texts based on common words. Expert Systems with Applications, 238, 122292. https://doi.org/https://doi.org/10.1016/j.eswa.2023.122292

Madyatmadja, E. D., Yahya, B. N., & Wijaya, C. (2022). Contextual Text Analytics Framework for Citizen Report Classification: A Case Study Using the Indonesian Language. IEEE Access, 10, 31432–31444. https://doi.org/10.1109/ACCESS.2022.3158940

Manservisi, F., Banzi, M., Tonelli, T., Veronesi, P., Ricci, S., Distante, D., Faralli, S., & Bortone, G. (2023). Environmental complaint insights through text mining based on the driver, pressure, state, impact, and response (DPSIR) framework: Evidence from an Italian environmental agency. Regional Sustainability, 4(3), 261–281. https://doi.org/10.1016/j.regsus.2023.08.002

Muftie, F., & Haris, M. (2023). IndoBERT Based Data Augmentation for Indonesian Text Classification. 2023 International Conference on Information Technology Research and Innovation (ICITRI), 128–132. https://doi.org/10.1109/ICITRI59340.2023.10250061

Natalia, S., & Wulandari, T. R. (2017). IDENTIFYING TYPES OF AFFIXES IN ENGLISH AND BAHASA INDONESIA. In HOLISTICS JOURNAL (Vol. 9).

Nundloll, V., Smail, R., Stevens, C., & Blair, G. (2022). Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon, 8(10). https://doi.org/10.1016/j.heliyon.2022.e10710

Park, J. Y., Mistur, E., Kim, D., Mo, Y., & Hoefer, R. (2022). Toward human-centric urban infrastructure: Text mining for social media data to identify the public perception of COVID-19 policy in transportation hubs. Sustainable Cities and Society, 76. https://doi.org/10.1016/j.scs.2021.103524

Peng, X., Li, Y., Si, Y., Xu, L., Liu, X., Li, D., & Liu, Y. (2022). A social sensing approach for everyday urban problem-handling with the 12345-complaint hotline data. Computers, Environment and Urban Systems, 94, 101790. https://doi.org/https://doi.org/10.1016/j.compenvurbsys.2022.101790

Shen, C., & Wang, Y. (2023). Citizen-initiated interactions in urban water governance: How public authorities respond to micro-level opinions related to nature-based solutions. Journal of Cleaner Production, 405, 137015. https://doi.org/https://doi.org/10.1016/j.jclepro.2023.137015

Singh, A., Saha, S., Hasanuzzaman, M., & Jangra, A. (2021). Identifying complaints based on semi-supervised mincuts. Expert Systems with Applications, 186, 115668. https://doi.org/https://doi.org/10.1016/j.eswa.2021.115668

Tho, C., Heryadi, Y., Kartowisastro, I. H., & Budiharto, W. (2021). A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages. 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), 1, 81–85. https://doi.org/10.1109/ICCSAI53272.2021.9609781

Wang, Z., & Zhong, Y. (2020). What were residents’ petitions in Beijing- based on text mining. Journal of Urban Management, 9(2), 228–237. https://doi.org/10.1016/j.jum.2019.11.006

Wu, H., Zhou, D., Sun, C., Zhang, Z., Ding, Y., & Chen, Y. (2024). LSOIT: Lexicon and Syntax Enhanced Opinion Induction Tree for Aspect-based Sentiment Analysis. Expert Systems with Applications, 235, 121137. https://doi.org/https://doi.org/10.1016/j.eswa.2023.121137

Yang, B., Wang, L., Wong, D. F., Shi, S., & Tu, Z. (2021). Context-aware Self-Attention Networks for Natural Language Processing. Neurocomputing, 458, 157–169. https://doi.org/https://doi.org/10.1016/j.neucom.2021.06.009

Yosephine, M., & Prabowo, Y. D. (2017). Pengembangan Aplikasi Pemeriksaan Kata Dasar dan Imbuhan pada Bahasa Indonesia. Jurnal Sains Dan Teknologi

Downloads

Published

2024-07-31

How to Cite

Lidiawaty, B. R., Nasution, A. H., Putra, A. R., Hidayah, R. A., Asyrof, H. F. N., & Grahadi, R. F. (2024). How Stemming Process Ruining the Meaning of Indonesia Phrases and BRL Method as Its Solutions for Handling Complaint Text. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 994–1006. https://doi.org/10.32493/jtsi.v7i3.40316