How Stemming Process Ruining the Meaning of Indonesia Phrases and BRL Method as Its Solutions for Handling Complaint Text


  • Berlian Rahmy Lidiawaty Telkom University
  • Anita Hakim Nasution Telkom University
  • Adzanil Rachmadi Putra Telkom University
  • Rafi Andi Hidayah Telkom University
  • Hayu Faiz Naufal Asyrof Telkom University
  • Raihan Febrianto Grahadi Telkom University


Kata Kunci:

Be raw language; skipping stemming in Bahasa Indonesia; Bahasa indoensia text preprocessing; Processing Indonesia complaint text


The stemming process in text preprocessing can ruin the meaning of words in Bahasa Indonesia text mining, potentially influencing the interpretation outcomes or the accuracy of machine learning models when processing complaint texts. Many Indonesians deliver their complaints by text, making this an important issue. Therefore, this research proposes the Be Raw Language (BRL) method for handling complaint texts. BRL generally circumvents several words that, when subjected to stemming, undergo changes in meaning. To ascertain whether a word undergoes changes and as a basis for analysis, this study employs a sentiment analysis approach utilizing 6,205 complaint text data sourced from community reviews concerning tourist destinations. Initially, these reviews are labeled as ground truth, and sentiment calculations are conducted. In the preliminary stage, the research findings indicate an accuracy rate of 60.23%. Subsequently, this study conducts an in-depth analysis on how words in the Indonesian language may change in meaning with the addition of prefixes or suffixes. Consequently, the concept of the BRL method emerges to analyze words without employing stemming and to delineate its approach in interpreting words along with their meanings. The study establishes three main rules for interpreting the meaning of a word or even phrases in Bahasa Indonesia texts to enhance accuracy. As a result, employing the BRL method increases the accuracy rate by 17.57% to 77.80%.


Aditya, I. A., Haryadi, F. N., Haryani, I., Rachmawati, I., Ramadhani, D. P., Tantra, T., & Alamsyah, A. (2023). Understanding service quality concerns from public discourse in Indonesia state electric company. Heliyon, 9(8).

Adriani, M., Asian, J., Nazief, B., & Tahaghoghi. (2005). Stemming Indonesian. Conferences in Research and Practice in Information Technology Series, 38, 307–314.

Amalia, A., Sitompul, O., Erna, N., & Mantoro, T. (2020). An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification.

Catelli, R., Pelosi, S., Comito, C., Pizzuti, C., & Esposito, M. (2023). Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy. Computers in Biology and Medicine, 158.

Chen, S., Zhang, Y., Song, B., Du, X., & Guizani, M. (2022). An Intelligent Government Complaint Prediction Approach. Big Data Research, 30, 100336.

Chung, S. (1976). An Object-Creating Rule in Bahasa Indonesia. Linguistic Inquiry, 7(1), 41–87.

Daneshfar, F., Soleymanbaigi, S., Nafisi, A., & Yamini, P. (2024). Elastic deep autoencoder for text embedding clustering by an improved graph regularization. Expert Systems with Applications, 238, 121780.

Ding, K., Gong, X. Y., Huang, T., & Choo, W. C. (2024). Recommend or not: A comparative analysis of customer reviews to uncover factors influencing explicit online recommendation behavior in peer-to-peer accommodation. European Research on Management and Business Economics, 30(1).

El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences, 34(10), 8422–8428.

Ghasemaghaei, M., Eslami, S. P., Deal, K., & Hassanein, K. (2018). Reviews’ length and sentiment as correlates of online reviews’ ratings. Internet Research, 28(3), 544–563.

Gozal, A. G., Pranoto, H., & Hasani, M. F. (2023). Sentiment analysis of the Indonesian community toward face-to-face learning during the Covid-19 pandemic. Procedia Computer Science, 227, 398–405.

Gupta, M., Singh, A., Jain, R., Saxena, A., & Ahmed, S. (2021). Multi-class railway complaints categorization using Neural Networks: RailNeural. Journal of Rail Transport Planning & Management, 20, 100265.

Haiyudi, H., & Art-In, S. (2021). Challenges, Strategies, and Solutions of Teaching Bahasa Indonesia in Covid-19 Crises: Case in Khon Kaen University. Indonesian Journal on Learning and Advanced Education (IJOLAE), 3(2), 142–152.

Huang, M., Xie, H., Rao, Y., Liu, Y., Poon, L. K. M., & Wang, F. L. (2022). Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis. IEEE Transactions on Affective Computing, 13(3), 1337–1348.

Ibrahim, N. F., & Wang, X. (2019). A text analytics approach for online retailing service improvement: Evidence from Twitter. Decision Support Systems, 121, 37–50.

Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter.

Ibrohim, M. O., & Budi, I. (2023). Hate speech and abusive language detection in Indonesian social media: Progress and challenges. In Heliyon (Vol. 9, Issue 8). Elsevier Ltd.

Kusumaningrum, R., Nisa, I. Z., Jayanto, R., Nawangsari, R. P., & Wibowo, A. (2023). Deep learning-based application for multilevel sentiment analysis of Indonesian hotel reviews. Heliyon, 9(6).

Kusumawardani, R. P., & Maulidani, M. W. (2020). Aspect-level Sentiment Analysis for Social Media Data in the Political Domain using Hierarchical Attention and Position Embeddings. 2020 International Conference on Data Science and Its Applications (ICoDSA), 1–5.

Kusumawardani, R. P., Priansya, S., & Atletiko, F. J. (2018). Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings. Procedia Computer Science, 144, 105–117.

Lidiawaty, B. R., Nasution, A. H., Putra, A. R., & Tjahyanto, A. (2024). Design Science Research for Developing Risk Tourism Mapping Based on Visitor Sentiment Review. Procedia Computer Science, 234, 1672–1680.

Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023a). Analyzing Traffic-Complaint Tweets Based on Time-Location Context Analysis to Develop A Traffic Urgency Model. 2023 8th International Conference on Business and Industrial Research (ICBIR), 585–590.

Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023b). Design Science Research in Developing Traffic Urgency Model From Text for Determining Transportation Complaint Priority - an Initial Investigation. 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), 25–30.

Lidiawaty, B. R., Zulfaqor, M. E., Diyantara, O., & Dewi, D. R. S. (2022). Keywords Generator From Paragraph Text Using Text Mining in Bahasa Indonesia. 2022 Interdisciplinary Research in Technology and Management (IRTM), 1–4.

Lutfiana, E. (2021). The Difficulties Faced by Foreigners in Learning Indonesian at Puri Bahasa Indonesia. JLA (Jurnal Lingua Applicata), 4(2), 89.

Ma, B., & Zhuge, H. (2024). Automatic construction of classification dimensions by clustering texts based on common words. Expert Systems with Applications, 238, 122292.

Madyatmadja, E. D., Yahya, B. N., & Wijaya, C. (2022). Contextual Text Analytics Framework for Citizen Report Classification: A Case Study Using the Indonesian Language. IEEE Access, 10, 31432–31444.

Manservisi, F., Banzi, M., Tonelli, T., Veronesi, P., Ricci, S., Distante, D., Faralli, S., & Bortone, G. (2023). Environmental complaint insights through text mining based on the driver, pressure, state, impact, and response (DPSIR) framework: Evidence from an Italian environmental agency. Regional Sustainability, 4(3), 261–281.

Muftie, F., & Haris, M. (2023). IndoBERT Based Data Augmentation for Indonesian Text Classification. 2023 International Conference on Information Technology Research and Innovation (ICITRI), 128–132.


Nundloll, V., Smail, R., Stevens, C., & Blair, G. (2022). Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon, 8(10).

Park, J. Y., Mistur, E., Kim, D., Mo, Y., & Hoefer, R. (2022). Toward human-centric urban infrastructure: Text mining for social media data to identify the public perception of COVID-19 policy in transportation hubs. Sustainable Cities and Society, 76.

Peng, X., Li, Y., Si, Y., Xu, L., Liu, X., Li, D., & Liu, Y. (2022). A social sensing approach for everyday urban problem-handling with the 12345-complaint hotline data. Computers, Environment and Urban Systems, 94, 101790.

Shen, C., & Wang, Y. (2023). Citizen-initiated interactions in urban water governance: How public authorities respond to micro-level opinions related to nature-based solutions. Journal of Cleaner Production, 405, 137015.

Singh, A., Saha, S., Hasanuzzaman, M., & Jangra, A. (2021). Identifying complaints based on semi-supervised mincuts. Expert Systems with Applications, 186, 115668.

Tho, C., Heryadi, Y., Kartowisastro, I. H., & Budiharto, W. (2021). A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages. 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), 1, 81–85.

Wang, Z., & Zhong, Y. (2020). What were residents’ petitions in Beijing- based on text mining. Journal of Urban Management, 9(2), 228–237.

Wu, H., Zhou, D., Sun, C., Zhang, Z., Ding, Y., & Chen, Y. (2024). LSOIT: Lexicon and Syntax Enhanced Opinion Induction Tree for Aspect-based Sentiment Analysis. Expert Systems with Applications, 235, 121137.

Yang, B., Wang, L., Wong, D. F., Shi, S., & Tu, Z. (2021). Context-aware Self-Attention Networks for Natural Language Processing. Neurocomputing, 458, 157–169.

Yosephine, M., & Prabowo, Y. D. (2017). Pengembangan Aplikasi Pemeriksaan Kata Dasar dan Imbuhan pada Bahasa Indonesia. Jurnal Sains Dan Teknologi



Cara Mengutip

Lidiawaty, B. R., Nasution, A. H., Putra, A. R., Hidayah, R. A., Asyrof, H. F. N., & Grahadi, R. F. (2024). How Stemming Process Ruining the Meaning of Indonesia Phrases and BRL Method as Its Solutions for Handling Complaint Text. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 994–1006.