Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda


  • Ilham Taufiq Universitas Muhammadiyah Kalimantan Timur
  • Taghfirul Azhima Yoga Siswa Universitas Muhammadiyah Kalimantan Timur
  • Wawan Joko Pranoto Universitas Muhammadiyah Kalimantan Timur



Classification; Flood; Random Forest; Imbalance; Chi-Square; Optimization


Flooding is a natural disaster that frequently affects our country. Samarinda City, in particular, continues to experience frequent flooding events with 18 incidents in 2018, 33 incidents in 2020, and 32 incidents in 2021. To predict flood disasters, it is necessary to utilize technology known as machine learning for analyzing and classifying floods. However, classification often encounters issues with high-dimensional data and class imbalance. This study aims to determine the extent to which the accuracy of flood disaster classification improves by using the Random Forest algorithm with PSO for optimization, Chi-Square feature selection, and SMOTE oversampling to balance classes. The data used in this study comprises flood data from 2021-2023 obtained from BMKG and BPBD Samarinda City, with a total of 1095 records and 11 attributes. The validation technique used is 5-fold cross-validation, and the evaluation uses a confusion matrix. The results of the Chi-Square feature selection identified Rainfall, Maximum Wind Direction, Most Frequent Wind Direction, Humidity, Sunshine Duration, and Wind Speed as the most influential features based on Chi-Square scores and P-values. The average accuracy obtained from the proposed classification model using 5-fold cross-validation reached 96.02%.


Abu El-Magd, S. A. (2022). Random forest and naïve Bayes approaches as tools for flash flood hazard susceptibility prediction, South Ras El-Zait, Gulf of Suez Coast, Egypt. Arabian Journal of Geosciences, 15(3), 1–12.

Aiyelokun, O. O., Aiyelokun, O. D., & Agbede, O. A. (2023). Application of random forest (RF) for flood levels prediction in Lower Ogun Basin, Nigeria. Natural Hazards, 119(3), 2179–2195.

Akbar, H., & Sanjaya, W. K. (2023). Kajian Performa Metode Class Weight Random Forest pada Klasifikasi Imbalance Data Kelas Curah Hujan. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi, 3(1).

Annur, C. M. (2023). BNPB: Tren Banjir di Indonesia Cenderung Menurun dalam Tiga Tahun Terakhir. Databoks.

BNPB. (2024). Infografis. BNPB.

BPS. (2024). Jumlah Desa/Kelurahan yang Mengalami Bencana Alam1 [Banjir] Menurut Kecamatan di Kota Samarinda 2018-2021. Badan Pusat Statistik Kota Samarinda.

Darabi, H., Torabi Haghighi, A., Rahmati, O., Jalali Shahrood, A., Rouzbeh, S., Pradhan, B., & Tien Bui, D. (2021). A hybridized model based on neural network and swarm intelligence-grey wolf algorithm for spatial prediction of urban flood-inundation. Journal of Hydrology, 603(PA), 126854.

Diba, F. (2023). Analisis Random Forest Menggunakan Principal Component Analysis Pada Data Berdimensi Tinggi. Indonesian Journal of Computer Science, 12(4), 2152–2160.

Dwiasnati, S., & Yudo Devianto. (2022). Optimization of Flood Prediction using SVM Algorithm to determine Flood Prone Areas. Journal of Systems Engineering and Information Technology (JOSEIT), 1(2), 40–46.

Grady, F., Tarigan, J. K., Wahidiyat, J. R., & Prasetyo, A. (2022). Classification of Flood Alert in Jakarta with Random Forest. Proceedings of the 2022 IEEE 7th International Conference on Information Technology and Digital Applications, ICITDA 2022, 1–6.

Hasan, K. A., & Al Mehedi Hasan, M. (2020). Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets. 2020 IEEE Region 10 Symposium, TENSYMP 2020, June, 758–761.

Ijaz, M., Asghar, Z., & Gul, A. (2021). Ensemble of penalized logistic models for classification of high-dimensional data. Communications in Statistics: Simulation and Computation, 50(7), 2072–2088.

Khan, T., Alam, M., Shaikh, F. A., Khan, S., Kadir, K., Mazliham, M. S., Shahid, Z., & Yahya, M. (2019). Flash floods prediction using real time data: An implementation of ANN-PSO with less false alarm. I2MTC 2019 - 2019 IEEE International Instrumentation and Measurement Technology Conference, Proceedings, 2019-May, 1–6.

Komal Kumar, N., Vigneswari, D., Vamsi Krishna, M., & Phanindra Reddy, G. V. (2019). An optimized random forest classifier for diabetes mellitus. In Advances in Intelligent Systems and Computing (Vol. 813). Springer Singapore.

Kurniabudi, K., Harris, A., & Veronica, V. (2022). Komparasi Performa Tree-Based Classifier Untuk Deteksi Anomali Pada Data Berdimensi Tinggi dan Tidak Seimbang. Jurnal Media Informatika Budidarma, 6(1), 370.

Kustiyahningsih, Y., Mula’ab, & Hasanah, N. (2020). Metode Fuzzy ID3 Untuk Klasifikasi Status Preeklamsi Ibu Hamil. Teknika, 9(1), 74–80.

Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350.

Priscillia, S., Schillaci, C., & Lipani, A. (2022). Arti fi cial Intelligence in Geosciences Flood susceptibility assessment using arti fi cial neural networks in Indonesia. Artificial Intelligence in Geosciences, 2(April), 215–222.

Putra, M. I., Yusuf, A., & Yalina, N. (2020). Klasifikasi Kelancaran Kredit Dengan Metode Random Forest. Systemic: Information System and Informatics Journal, 5(2), 7–12.

Razali, N., Ismail, S., & Mustapha, A. (2020). Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence, 9(1), 73–80.

Saputra, A., & Siswa, T. A. Y. (2022). Optimasi Chi Square Dan Perbaikan Teknik Prunning Untuk Peningkatan Akurasi Algoritma C4.5 Dalam Model Kasus Prediksi Keterlambatan Biaya Kuliah. JIKO (Jurnal Informatika Dan Komputer), 6(2), 231.

Sharma, P., Kar, B., Wang, J., & Bausch, D. (2021). A machine learning approach to flood severity classification and alerting. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, ARIC 2021, November, 42–47.

Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101.

Vafakhah, M., Mohammad Hasani Loor, S., Pourghasemi, H., & Katebikord, A. (2020). Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arabian Journal of Geosciences, 13(11), 1–16.

Williamson, S., Vijayakumar, K., & Kadam, V. J. (2022). Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimedia Tools and Applications, 81(26), 36869–36889.

Yoga, T. A., & Prihandoko. (2018). Penerapan Optimasi Berbasis Particle Swarm Optimization (Pso) Algoritma Naïve Bayes Dan K-Nearest Neighbor Sebagai Perbandingan Untuk Mencari Kinerja Terbaik Dalam Mendeteksi Kanker Payudara. Jurnal Bangkit Indonesia, 7(2), 1.

Zhang, Z., Qiu, J., Huang, X., Cai, Z., Zhu, L., & Dai, W. (2021). Comparing and Evaluating Macao Flood Prediction Models. IOP Conference Series: Earth and Environmental Science, 769(2).



How to Cite

Taufiq, I., Siswa, T. A. Y., & Pranoto, W. J. (2024). Model Optimasi Random Forest dengan PSO-CHI-SM dalam Mengatasi High Dimensional dan Imbalanced Data Banjir Kota Samarinda. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 1267–1279.