How Stemming Process Ruining the Meaning of Indonesia Phrases and BRL Method as Its Solutions for Handling Complaint Text
DOI:
https://doi.org/10.32493/jtsi.v7i3.40316Kata Kunci:
Be raw language; skipping stemming in Bahasa Indonesia; Bahasa indoensia text preprocessing; Processing Indonesia complaint textAbstrak
The stemming process in text preprocessing can ruin the meaning of words in Bahasa Indonesia text mining, potentially influencing the interpretation outcomes or the accuracy of machine learning models when processing complaint texts. Many Indonesians deliver their complaints by text, making this an important issue. Therefore, this research proposes the Be Raw Language (BRL) method for handling complaint texts. BRL generally circumvents several words that, when subjected to stemming, undergo changes in meaning. To ascertain whether a word undergoes changes and as a basis for analysis, this study employs a sentiment analysis approach utilizing 6,205 complaint text data sourced from community reviews concerning tourist destinations. Initially, these reviews are labeled as ground truth, and sentiment calculations are conducted. In the preliminary stage, the research findings indicate an accuracy rate of 60.23%. Subsequently, this study conducts an in-depth analysis on how words in the Indonesian language may change in meaning with the addition of prefixes or suffixes. Consequently, the concept of the BRL method emerges to analyze words without employing stemming and to delineate its approach in interpreting words along with their meanings. The study establishes three main rules for interpreting the meaning of a word or even phrases in Bahasa Indonesia texts to enhance accuracy. As a result, employing the BRL method increases the accuracy rate by 17.57% to 77.80%.
Referensi
Aditya, I. A., Haryadi, F. N., Haryani, I., Rachmawati, I., Ramadhani, D. P., Tantra, T., & Alamsyah, A. (2023). Understanding service quality concerns from public discourse in Indonesia state electric company. Heliyon, 9(8). https://doi.org/10.1016/j.heliyon.2023.e18768
Adriani, M., Asian, J., Nazief, B., & Tahaghoghi. (2005). Stemming Indonesian. Conferences in Research and Practice in Information Technology Series, 38, 307–314. https://doi.org/10.1145/1316457.1316459
Amalia, A., Sitompul, O., Erna, N., & Mantoro, T. (2020). An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification.
Catelli, R., Pelosi, S., Comito, C., Pizzuti, C., & Esposito, M. (2023). Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy. Computers in Biology and Medicine, 158. https://doi.org/10.1016/j.compbiomed.2023.106876
Chen, S., Zhang, Y., Song, B., Du, X., & Guizani, M. (2022). An Intelligent Government Complaint Prediction Approach. Big Data Research, 30, 100336. https://doi.org/https://doi.org/10.1016/j.bdr.2022.100336
Chung, S. (1976). An Object-Creating Rule in Bahasa Indonesia. Linguistic Inquiry, 7(1), 41–87. http://www.jstor.org/stable/4177912
Daneshfar, F., Soleymanbaigi, S., Nafisi, A., & Yamini, P. (2024). Elastic deep autoencoder for text embedding clustering by an improved graph regularization. Expert Systems with Applications, 238, 121780. https://doi.org/https://doi.org/10.1016/j.eswa.2023.121780
Ding, K., Gong, X. Y., Huang, T., & Choo, W. C. (2024). Recommend or not: A comparative analysis of customer reviews to uncover factors influencing explicit online recommendation behavior in peer-to-peer accommodation. European Research on Management and Business Economics, 30(1). https://doi.org/10.1016/j.iedeen.2023.100236
El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences, 34(10), 8422–8428. https://doi.org/10.1016/j.jksuci.2021.02.005
Ghasemaghaei, M., Eslami, S. P., Deal, K., & Hassanein, K. (2018). Reviews’ length and sentiment as correlates of online reviews’ ratings. Internet Research, 28(3), 544–563. https://doi.org/10.1108/IntR-12-2016-0394
Gozal, A. G., Pranoto, H., & Hasani, M. F. (2023). Sentiment analysis of the Indonesian community toward face-to-face learning during the Covid-19 pandemic. Procedia Computer Science, 227, 398–405. https://doi.org/10.1016/j.procs.2023.10.539
Gupta, M., Singh, A., Jain, R., Saxena, A., & Ahmed, S. (2021). Multi-class railway complaints categorization using Neural Networks: RailNeural. Journal of Rail Transport Planning & Management, 20, 100265. https://doi.org/https://doi.org/10.1016/j.jrtpm.2021.100265
Haiyudi, H., & Art-In, S. (2021). Challenges, Strategies, and Solutions of Teaching Bahasa Indonesia in Covid-19 Crises: Case in Khon Kaen University. Indonesian Journal on Learning and Advanced Education (IJOLAE), 3(2), 142–152. https://doi.org/10.23917/ijolae.v3i2.12369
Huang, M., Xie, H., Rao, Y., Liu, Y., Poon, L. K. M., & Wang, F. L. (2022). Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis. IEEE Transactions on Affective Computing, 13(3), 1337–1348. https://doi.org/10.1109/TAFFC.2020.2997769
Ibrahim, N. F., & Wang, X. (2019). A text analytics approach for online retailing service improvement: Evidence from Twitter. Decision Support Systems, 121, 37–50. https://doi.org/https://doi.org/10.1016/j.dss.2019.03.002
Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. https://www.komnasham.go.id/index.php/
Ibrohim, M. O., & Budi, I. (2023). Hate speech and abusive language detection in Indonesian social media: Progress and challenges. In Heliyon (Vol. 9, Issue 8). Elsevier Ltd. https://doi.org/10.1016/j.heliyon.2023.e18647
Kusumaningrum, R., Nisa, I. Z., Jayanto, R., Nawangsari, R. P., & Wibowo, A. (2023). Deep learning-based application for multilevel sentiment analysis of Indonesian hotel reviews. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e17147
Kusumawardani, R. P., & Maulidani, M. W. (2020). Aspect-level Sentiment Analysis for Social Media Data in the Political Domain using Hierarchical Attention and Position Embeddings. 2020 International Conference on Data Science and Its Applications (ICoDSA), 1–5. https://doi.org/10.1109/ICoDSA50139.2020.9212883
Kusumawardani, R. P., Priansya, S., & Atletiko, F. J. (2018). Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings. Procedia Computer Science, 144, 105–117. https://doi.org/10.1016/j.procs.2018.10.510
Lidiawaty, B. R., Nasution, A. H., Putra, A. R., & Tjahyanto, A. (2024). Design Science Research for Developing Risk Tourism Mapping Based on Visitor Sentiment Review. Procedia Computer Science, 234, 1672–1680. https://doi.org/https://doi.org/10.1016/j.procs.2024.03.172
Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023a). Analyzing Traffic-Complaint Tweets Based on Time-Location Context Analysis to Develop A Traffic Urgency Model. 2023 8th International Conference on Business and Industrial Research (ICBIR), 585–590. https://doi.org/10.1109/ICBIR57571.2023.10147706
Lidiawaty, B. R., Suryani, E., & Vinarti, R. A. (2023b). Design Science Research in Developing Traffic Urgency Model From Text for Determining Transportation Complaint Priority - an Initial Investigation. 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), 25–30. https://doi.org/10.1109/ISITIA59021.2023.10221083
Lidiawaty, B. R., Zulfaqor, M. E., Diyantara, O., & Dewi, D. R. S. (2022). Keywords Generator From Paragraph Text Using Text Mining in Bahasa Indonesia. 2022 Interdisciplinary Research in Technology and Management (IRTM), 1–4. https://doi.org/10.1109/IRTM54583.2022.9791753
Lutfiana, E. (2021). The Difficulties Faced by Foreigners in Learning Indonesian at Puri Bahasa Indonesia. JLA (Jurnal Lingua Applicata), 4(2), 89. https://doi.org/10.22146/jla.58394
Ma, B., & Zhuge, H. (2024). Automatic construction of classification dimensions by clustering texts based on common words. Expert Systems with Applications, 238, 122292. https://doi.org/https://doi.org/10.1016/j.eswa.2023.122292
Madyatmadja, E. D., Yahya, B. N., & Wijaya, C. (2022). Contextual Text Analytics Framework for Citizen Report Classification: A Case Study Using the Indonesian Language. IEEE Access, 10, 31432–31444. https://doi.org/10.1109/ACCESS.2022.3158940
Manservisi, F., Banzi, M., Tonelli, T., Veronesi, P., Ricci, S., Distante, D., Faralli, S., & Bortone, G. (2023). Environmental complaint insights through text mining based on the driver, pressure, state, impact, and response (DPSIR) framework: Evidence from an Italian environmental agency. Regional Sustainability, 4(3), 261–281. https://doi.org/10.1016/j.regsus.2023.08.002
Muftie, F., & Haris, M. (2023). IndoBERT Based Data Augmentation for Indonesian Text Classification. 2023 International Conference on Information Technology Research and Innovation (ICITRI), 128–132. https://doi.org/10.1109/ICITRI59340.2023.10250061
Natalia, S., & Wulandari, T. R. (2017). IDENTIFYING TYPES OF AFFIXES IN ENGLISH AND BAHASA INDONESIA. In HOLISTICS JOURNAL (Vol. 9).
Nundloll, V., Smail, R., Stevens, C., & Blair, G. (2022). Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon, 8(10). https://doi.org/10.1016/j.heliyon.2022.e10710
Park, J. Y., Mistur, E., Kim, D., Mo, Y., & Hoefer, R. (2022). Toward human-centric urban infrastructure: Text mining for social media data to identify the public perception of COVID-19 policy in transportation hubs. Sustainable Cities and Society, 76. https://doi.org/10.1016/j.scs.2021.103524
Peng, X., Li, Y., Si, Y., Xu, L., Liu, X., Li, D., & Liu, Y. (2022). A social sensing approach for everyday urban problem-handling with the 12345-complaint hotline data. Computers, Environment and Urban Systems, 94, 101790. https://doi.org/https://doi.org/10.1016/j.compenvurbsys.2022.101790
Shen, C., & Wang, Y. (2023). Citizen-initiated interactions in urban water governance: How public authorities respond to micro-level opinions related to nature-based solutions. Journal of Cleaner Production, 405, 137015. https://doi.org/https://doi.org/10.1016/j.jclepro.2023.137015
Singh, A., Saha, S., Hasanuzzaman, M., & Jangra, A. (2021). Identifying complaints based on semi-supervised mincuts. Expert Systems with Applications, 186, 115668. https://doi.org/https://doi.org/10.1016/j.eswa.2021.115668
Tho, C., Heryadi, Y., Kartowisastro, I. H., & Budiharto, W. (2021). A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages. 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), 1, 81–85. https://doi.org/10.1109/ICCSAI53272.2021.9609781
Wang, Z., & Zhong, Y. (2020). What were residents’ petitions in Beijing- based on text mining. Journal of Urban Management, 9(2), 228–237. https://doi.org/10.1016/j.jum.2019.11.006
Wu, H., Zhou, D., Sun, C., Zhang, Z., Ding, Y., & Chen, Y. (2024). LSOIT: Lexicon and Syntax Enhanced Opinion Induction Tree for Aspect-based Sentiment Analysis. Expert Systems with Applications, 235, 121137. https://doi.org/https://doi.org/10.1016/j.eswa.2023.121137
Yang, B., Wang, L., Wong, D. F., Shi, S., & Tu, Z. (2021). Context-aware Self-Attention Networks for Natural Language Processing. Neurocomputing, 458, 157–169. https://doi.org/https://doi.org/10.1016/j.neucom.2021.06.009
Yosephine, M., & Prabowo, Y. D. (2017). Pengembangan Aplikasi Pemeriksaan Kata Dasar dan Imbuhan pada Bahasa Indonesia. Jurnal Sains Dan Teknologi
Unduhan
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2024 Berlian Rahmy Lidiawaty, Anita Hakim Nasution, Adzanil Rachmadi Putra, Rafi Andi Hidayah, Hayu Faiz Naufal Asyrof, Raihan Febrianto Grahadi

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Teknologi Sistem Informasi dan Aplikasi have CC BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Teknologi Sistem Informasi dan Aplikasi recognize that free access is better than priced access, libre access is better than free access, and libre under CC BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License
YOU ARE FREE TO:
- Share - copy and redistribute the material in any medium or format
- Adapt - remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms