Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda
DOI:
https://doi.org/10.32493/jtsi.v7i3.41632Keywords:
Classification; Flood; Random Forest; Imbalance; Chi-Square; OptimizationAbstract
Flooding is a natural disaster that frequently affects our country. Samarinda City, in particular, continues to experience frequent flooding events with 18 incidents in 2018, 33 incidents in 2020, and 32 incidents in 2021. To predict flood disasters, it is necessary to utilize technology known as machine learning for analyzing and classifying floods. However, classification often encounters issues with high-dimensional data and class imbalance. This study aims to determine the extent to which the accuracy of flood disaster classification improves by using the Random Forest algorithm with PSO for optimization, Chi-Square feature selection, and SMOTE oversampling to balance classes. The data used in this study comprises flood data from 2021-2023 obtained from BMKG and BPBD Samarinda City, with a total of 1095 records and 11 attributes. The validation technique used is 5-fold cross-validation, and the evaluation uses a confusion matrix. The results of the Chi-Square feature selection identified Rainfall, Maximum Wind Direction, Most Frequent Wind Direction, Humidity, Sunshine Duration, and Wind Speed as the most influential features based on Chi-Square scores and P-values. The average accuracy obtained from the proposed classification model using 5-fold cross-validation reached 96.02%.
References
Abu El-Magd, S. A. (2022). Random forest and naïve Bayes approaches as tools for flash flood hazard susceptibility prediction, South Ras El-Zait, Gulf of Suez Coast, Egypt. Arabian Journal of Geosciences, 15(3), 1–12. https://doi.org/10.1007/s12517-022-09531-3
Aiyelokun, O. O., Aiyelokun, O. D., & Agbede, O. A. (2023). Application of random forest (RF) for flood levels prediction in Lower Ogun Basin, Nigeria. Natural Hazards, 119(3), 2179–2195. https://doi.org/10.1007/s11069-023-06211-7
Akbar, H., & Sanjaya, W. K. (2023). Kajian Performa Metode Class Weight Random Forest pada Klasifikasi Imbalance Data Kelas Curah Hujan. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi, 3(1). https://doi.org/10.20885/snati.v3i1.30
Annur, C. M. (2023). BNPB: Tren Banjir di Indonesia Cenderung Menurun dalam Tiga Tahun Terakhir. Databoks. https://databoks.katadata.co.id/datapublish/2023/02/20/bnpb-tren-banjir-di-indonesia-cenderung-menurun-dalam-tiga-tahun-terakhir
BNPB. (2024). Infografis. BNPB. https://bnpb.go.id/infografis
BPS. (2024). Jumlah Desa/Kelurahan yang Mengalami Bencana Alam1 [Banjir] Menurut Kecamatan di Kota Samarinda 2018-2021. Badan Pusat Statistik Kota Samarinda. https://samarindakota.bps.go.id/indicator/153/207/1/jumlah-desa-kelurahan-yang-mengalami-bencana-alam-banjir-menurut-kecamatan-di-kota-samarinda.html
Darabi, H., Torabi Haghighi, A., Rahmati, O., Jalali Shahrood, A., Rouzbeh, S., Pradhan, B., & Tien Bui, D. (2021). A hybridized model based on neural network and swarm intelligence-grey wolf algorithm for spatial prediction of urban flood-inundation. Journal of Hydrology, 603(PA), 126854. https://doi.org/10.1016/j.jhydrol.2021.126854
Diba, F. (2023). Analisis Random Forest Menggunakan Principal Component Analysis Pada Data Berdimensi Tinggi. Indonesian Journal of Computer Science, 12(4), 2152–2160. https://doi.org/10.33022/ijcs.v12i4.3329
Dwiasnati, S., & Yudo Devianto. (2022). Optimization of Flood Prediction using SVM Algorithm to determine Flood Prone Areas. Journal of Systems Engineering and Information Technology (JOSEIT), 1(2), 40–46. https://doi.org/10.29207/joseit.v1i2.1995
Grady, F., Tarigan, J. K., Wahidiyat, J. R., & Prasetyo, A. (2022). Classification of Flood Alert in Jakarta with Random Forest. Proceedings of the 2022 IEEE 7th International Conference on Information Technology and Digital Applications, ICITDA 2022, 1–6. https://doi.org/10.1109/ICITDA55840.2022.9971411
Hasan, K. A., & Al Mehedi Hasan, M. (2020). Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets. 2020 IEEE Region 10 Symposium, TENSYMP 2020, June, 758–761. https://doi.org/10.1109/TENSYMP50017.2020.9230842
Ijaz, M., Asghar, Z., & Gul, A. (2021). Ensemble of penalized logistic models for classification of high-dimensional data. Communications in Statistics: Simulation and Computation, 50(7), 2072–2088. https://doi.org/10.1080/03610918.2019.1595647
Khan, T., Alam, M., Shaikh, F. A., Khan, S., Kadir, K., Mazliham, M. S., Shahid, Z., & Yahya, M. (2019). Flash floods prediction using real time data: An implementation of ANN-PSO with less false alarm. I2MTC 2019 - 2019 IEEE International Instrumentation and Measurement Technology Conference, Proceedings, 2019-May, 1–6. https://doi.org/10.1109/I2MTC.2019.8826825
Komal Kumar, N., Vigneswari, D., Vamsi Krishna, M., & Phanindra Reddy, G. V. (2019). An optimized random forest classifier for diabetes mellitus. In Advances in Intelligent Systems and Computing (Vol. 813). Springer Singapore. https://doi.org/10.1007/978-981-13-1498-8_67
Kurniabudi, K., Harris, A., & Veronica, V. (2022). Komparasi Performa Tree-Based Classifier Untuk Deteksi Anomali Pada Data Berdimensi Tinggi dan Tidak Seimbang. Jurnal Media Informatika Budidarma, 6(1), 370. https://doi.org/10.30865/mib.v6i1.3473
Kustiyahningsih, Y., Mula’ab, & Hasanah, N. (2020). Metode Fuzzy ID3 Untuk Klasifikasi Status Preeklamsi Ibu Hamil. Teknika, 9(1), 74–80. https://doi.org/10.34148/teknika.v9i1.270
Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350. https://doi.org/10.11591/ijeecs.v18.i3.pp1342-1350
Priscillia, S., Schillaci, C., & Lipani, A. (2022). Arti fi cial Intelligence in Geosciences Flood susceptibility assessment using arti fi cial neural networks in Indonesia. Artificial Intelligence in Geosciences, 2(April), 215–222. https://doi.org/10.1016/j.aiig.2022.03.002
Putra, M. I., Yusuf, A., & Yalina, N. (2020). Klasifikasi Kelancaran Kredit Dengan Metode Random Forest. Systemic: Information System and Informatics Journal, 5(2), 7–12. https://doi.org/10.29080/systemic.v5i2.713
Razali, N., Ismail, S., & Mustapha, A. (2020). Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence, 9(1), 73–80. https://doi.org/10.11591/ijai.v9.i1.pp73-80
Saputra, A., & Siswa, T. A. Y. (2022). Optimasi Chi Square Dan Perbaikan Teknik Prunning Untuk Peningkatan Akurasi Algoritma C4.5 Dalam Model Kasus Prediksi Keterlambatan Biaya Kuliah. JIKO (Jurnal Informatika Dan Komputer), 6(2), 231. https://doi.org/10.26798/jiko.v6i2.648
Sharma, P., Kar, B., Wang, J., & Bausch, D. (2021). A machine learning approach to flood severity classification and alerting. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, ARIC 2021, November, 42–47. https://doi.org/10.1145/3486626.3493432
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028
Vafakhah, M., Mohammad Hasani Loor, S., Pourghasemi, H., & Katebikord, A. (2020). Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arabian Journal of Geosciences, 13(11), 1–16. https://doi.org/10.1007/s12517-020-05363-1
Williamson, S., Vijayakumar, K., & Kadam, V. J. (2022). Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimedia Tools and Applications, 81(26), 36869–36889. https://doi.org/10.1007/s11042-021-11114-5
Yoga, T. A., & Prihandoko. (2018). Penerapan Optimasi Berbasis Particle Swarm Optimization (Pso) Algoritma Naïve Bayes Dan K-Nearest Neighbor Sebagai Perbandingan Untuk Mencari Kinerja Terbaik Dalam Mendeteksi Kanker Payudara. Jurnal Bangkit Indonesia, 7(2), 1. http://journal.universitasmulia.ac.id/index.php/metik/article/view/62
Zhang, Z., Qiu, J., Huang, X., Cai, Z., Zhu, L., & Dai, W. (2021). Comparing and Evaluating Macao Flood Prediction Models. IOP Conference Series: Earth and Environmental Science, 769(2). https://doi.org/10.1088/1755-1315/769/2/022001
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ilham Taufiq, Taghfirul Azhima Yoga Siswa, Wawan Joko Pranoto

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Teknologi Sistem Informasi dan Aplikasi have CC BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Teknologi Sistem Informasi dan Aplikasi recognize that free access is better than priced access, libre access is better than free access, and libre under CC BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License
YOU ARE FREE TO:
- Share - copy and redistribute the material in any medium or format
- Adapt - remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms