Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda

Authors

  • Ilham Taufiq Universitas Muhammadiyah Kalimantan Timur
  • Taghfirul Azhima Yoga Siswa Universitas Muhammadiyah Kalimantan Timur
  • Wawan Joko Pranoto Universitas Muhammadiyah Kalimantan Timur

DOI:

https://doi.org/10.32493/jtsi.v7i3.41632

Keywords:

Classification; Flood; Random Forest; Imbalance; Chi-Square; Optimization

Abstract

Flooding is a natural disaster that frequently affects our country. Samarinda City, in particular, continues to experience frequent flooding events with 18 incidents in 2018, 33 incidents in 2020, and 32 incidents in 2021. To predict flood disasters, it is necessary to utilize technology known as machine learning for analyzing and classifying floods. However, classification often encounters issues with high-dimensional data and class imbalance. This study aims to determine the extent to which the accuracy of flood disaster classification improves by using the Random Forest algorithm with PSO for optimization, Chi-Square feature selection, and SMOTE oversampling to balance classes. The data used in this study comprises flood data from 2021-2023 obtained from BMKG and BPBD Samarinda City, with a total of 1095 records and 11 attributes. The validation technique used is 5-fold cross-validation, and the evaluation uses a confusion matrix. The results of the Chi-Square feature selection identified Rainfall, Maximum Wind Direction, Most Frequent Wind Direction, Humidity, Sunshine Duration, and Wind Speed as the most influential features based on Chi-Square scores and P-values. The average accuracy obtained from the proposed classification model using 5-fold cross-validation reached 96.02%.

References

Abu El-Magd, S. A. (2022). Random forest and naïve Bayes approaches as tools for flash flood hazard susceptibility prediction, South Ras El-Zait, Gulf of Suez Coast, Egypt. Arabian Journal of Geosciences, 15(3), 1–12. https://doi.org/10.1007/s12517-022-09531-3

Aiyelokun, O. O., Aiyelokun, O. D., & Agbede, O. A. (2023). Application of random forest (RF) for flood levels prediction in Lower Ogun Basin, Nigeria. Natural Hazards, 119(3), 2179–2195. https://doi.org/10.1007/s11069-023-06211-7

Akbar, H., & Sanjaya, W. K. (2023). Kajian Performa Metode Class Weight Random Forest pada Klasifikasi Imbalance Data Kelas Curah Hujan. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi, 3(1). https://doi.org/10.20885/snati.v3i1.30

Annur, C. M. (2023). BNPB: Tren Banjir di Indonesia Cenderung Menurun dalam Tiga Tahun Terakhir. Databoks. https://databoks.katadata.co.id/datapublish/2023/02/20/bnpb-tren-banjir-di-indonesia-cenderung-menurun-dalam-tiga-tahun-terakhir

BNPB. (2024). Infografis. BNPB. https://bnpb.go.id/infografis

BPS. (2024). Jumlah Desa/Kelurahan yang Mengalami Bencana Alam1 [Banjir] Menurut Kecamatan di Kota Samarinda 2018-2021. Badan Pusat Statistik Kota Samarinda. https://samarindakota.bps.go.id/indicator/153/207/1/jumlah-desa-kelurahan-yang-mengalami-bencana-alam-banjir-menurut-kecamatan-di-kota-samarinda.html

Darabi, H., Torabi Haghighi, A., Rahmati, O., Jalali Shahrood, A., Rouzbeh, S., Pradhan, B., & Tien Bui, D. (2021). A hybridized model based on neural network and swarm intelligence-grey wolf algorithm for spatial prediction of urban flood-inundation. Journal of Hydrology, 603(PA), 126854. https://doi.org/10.1016/j.jhydrol.2021.126854

Diba, F. (2023). Analisis Random Forest Menggunakan Principal Component Analysis Pada Data Berdimensi Tinggi. Indonesian Journal of Computer Science, 12(4), 2152–2160. https://doi.org/10.33022/ijcs.v12i4.3329

Dwiasnati, S., & Yudo Devianto. (2022). Optimization of Flood Prediction using SVM Algorithm to determine Flood Prone Areas. Journal of Systems Engineering and Information Technology (JOSEIT), 1(2), 40–46. https://doi.org/10.29207/joseit.v1i2.1995

Grady, F., Tarigan, J. K., Wahidiyat, J. R., & Prasetyo, A. (2022). Classification of Flood Alert in Jakarta with Random Forest. Proceedings of the 2022 IEEE 7th International Conference on Information Technology and Digital Applications, ICITDA 2022, 1–6. https://doi.org/10.1109/ICITDA55840.2022.9971411

Hasan, K. A., & Al Mehedi Hasan, M. (2020). Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets. 2020 IEEE Region 10 Symposium, TENSYMP 2020, June, 758–761. https://doi.org/10.1109/TENSYMP50017.2020.9230842

Ijaz, M., Asghar, Z., & Gul, A. (2021). Ensemble of penalized logistic models for classification of high-dimensional data. Communications in Statistics: Simulation and Computation, 50(7), 2072–2088. https://doi.org/10.1080/03610918.2019.1595647

Khan, T., Alam, M., Shaikh, F. A., Khan, S., Kadir, K., Mazliham, M. S., Shahid, Z., & Yahya, M. (2019). Flash floods prediction using real time data: An implementation of ANN-PSO with less false alarm. I2MTC 2019 - 2019 IEEE International Instrumentation and Measurement Technology Conference, Proceedings, 2019-May, 1–6. https://doi.org/10.1109/I2MTC.2019.8826825

Komal Kumar, N., Vigneswari, D., Vamsi Krishna, M., & Phanindra Reddy, G. V. (2019). An optimized random forest classifier for diabetes mellitus. In Advances in Intelligent Systems and Computing (Vol. 813). Springer Singapore. https://doi.org/10.1007/978-981-13-1498-8_67

Kurniabudi, K., Harris, A., & Veronica, V. (2022). Komparasi Performa Tree-Based Classifier Untuk Deteksi Anomali Pada Data Berdimensi Tinggi dan Tidak Seimbang. Jurnal Media Informatika Budidarma, 6(1), 370. https://doi.org/10.30865/mib.v6i1.3473

Kustiyahningsih, Y., Mula’ab, & Hasanah, N. (2020). Metode Fuzzy ID3 Untuk Klasifikasi Status Preeklamsi Ibu Hamil. Teknika, 9(1), 74–80. https://doi.org/10.34148/teknika.v9i1.270

Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350. https://doi.org/10.11591/ijeecs.v18.i3.pp1342-1350

Priscillia, S., Schillaci, C., & Lipani, A. (2022). Arti fi cial Intelligence in Geosciences Flood susceptibility assessment using arti fi cial neural networks in Indonesia. Artificial Intelligence in Geosciences, 2(April), 215–222. https://doi.org/10.1016/j.aiig.2022.03.002

Putra, M. I., Yusuf, A., & Yalina, N. (2020). Klasifikasi Kelancaran Kredit Dengan Metode Random Forest. Systemic: Information System and Informatics Journal, 5(2), 7–12. https://doi.org/10.29080/systemic.v5i2.713

Razali, N., Ismail, S., & Mustapha, A. (2020). Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence, 9(1), 73–80. https://doi.org/10.11591/ijai.v9.i1.pp73-80

Saputra, A., & Siswa, T. A. Y. (2022). Optimasi Chi Square Dan Perbaikan Teknik Prunning Untuk Peningkatan Akurasi Algoritma C4.5 Dalam Model Kasus Prediksi Keterlambatan Biaya Kuliah. JIKO (Jurnal Informatika Dan Komputer), 6(2), 231. https://doi.org/10.26798/jiko.v6i2.648

Sharma, P., Kar, B., Wang, J., & Bausch, D. (2021). A machine learning approach to flood severity classification and alerting. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, ARIC 2021, November, 42–47. https://doi.org/10.1145/3486626.3493432

Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028

Vafakhah, M., Mohammad Hasani Loor, S., Pourghasemi, H., & Katebikord, A. (2020). Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arabian Journal of Geosciences, 13(11), 1–16. https://doi.org/10.1007/s12517-020-05363-1

Williamson, S., Vijayakumar, K., & Kadam, V. J. (2022). Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimedia Tools and Applications, 81(26), 36869–36889. https://doi.org/10.1007/s11042-021-11114-5

Yoga, T. A., & Prihandoko. (2018). Penerapan Optimasi Berbasis Particle Swarm Optimization (Pso) Algoritma Naïve Bayes Dan K-Nearest Neighbor Sebagai Perbandingan Untuk Mencari Kinerja Terbaik Dalam Mendeteksi Kanker Payudara. Jurnal Bangkit Indonesia, 7(2), 1. http://journal.universitasmulia.ac.id/index.php/metik/article/view/62

Zhang, Z., Qiu, J., Huang, X., Cai, Z., Zhu, L., & Dai, W. (2021). Comparing and Evaluating Macao Flood Prediction Models. IOP Conference Series: Earth and Environmental Science, 769(2). https://doi.org/10.1088/1755-1315/769/2/022001

Published

2024-07-31

How to Cite

Taufiq, I., Siswa, T. A. Y., & Pranoto, W. J. (2024). Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 1267–1279. https://doi.org/10.32493/jtsi.v7i3.41632