Model Optimasi KNN-PSORF dalam Menangani High Dimensional Data Banjir Kota Samarinda

Authors

  • Anggiq Karisma Aji Restu Universitas Muhammadiyah Kalimantan Timur
  • Taghfirul Azhima Yoga Siswa Universitas Muhammadiyah Kalimantan Timur
  • Wawan Joko Pranoto Universitas Muhammadiyah Kalimantan Timur

DOI:

https://doi.org/10.32493/jtsi.v7i3.41587

Keywords:

K-Nearest Neighbor; Relief; Flood; 10-Fold Cross-Validation; Classification

Abstract

Floods are a natural phenomenon that frequently occurs in Indonesia, including in Samarinda City which has faced flood issues over the past three years, affecting thousands of homes and around 27,000 residents. Predicting flood disasters requires machine learning technology using data mining classification methods. However, classification processes often encounter issues related to high-dimensional data, which can lead to overfitting and class imbalance, thereby biasing dominant classes while neglecting minority classes. This research aims to enhance classification accuracy in Samarinda City's flood data using the K-Nearest Neighbor (KNN) algorithm combined with Relief feature selection and Particle Swarm Optimization (PSO) optimization. The validation method employed is 10-fold cross-validation, with performance evaluation using a confusion matrix. Data sourced from Samarinda City's Disaster Management Agency (BPBD) and Meteorology, Climatology, and Geophysics Agency (BMKG) spans from 2021 to 2023, comprising 19 features and a total of 1095 records. Relief feature selection identified four crucial features: maximum wind direction, wind speed, average wind speed, and maximum wind speed direction. Average evaluations with k values of 3, 5, 7, 11, 13, and 15 demonstrate that Relief feature selection and PSO optimization effectively enhance accuracy in the K-Nearest Neighbor algorithm for flood data, with KNN and PSO yielding improvements of 2-5%. Relief feature selection alone improves accuracy by 1-2%, while combining Relief with PSO provides a 2-5% enhancement. The combined KNN, Relief, PSO model is expected to deliver optimal performance in classifying Samarinda City's flood data.

References

Abdulrazaq, M. B., Mahmood, M. R., Zeebaree, S. R. M., Abdulwahab, M. H., Zebari, R. R., & Sallow, A. B. (2021). An Analytical Appraisal for Supervised Classifiers’ Performance on Facial Expression Recognition Based on Relief-F Feature Selection. Journal of Physics: Conference Series, 1804(1). https://doi.org/10.1088/1742-6596/1804/1/012055

Ariyoga, D. (2022). Perbandingan Metode Seleksi Fitur Filter, Wrapper, Dan Embedded Pada Klasifikasi Data Nirs Mangga Menggunakan Random Forest Dan Support Vector Machine .https://dspace.uii.ac.id/handle/123456789/38955

Arora, A., Arabameri, A., Pandey, M., Siddiqui, M. A., Shukla, U. K., Bui, D. T., Mishra, V. N., & Bhardwaj, A. (2021). Optimization of state-of-the-art fuzzy-metaheuristic ANFIS-based machine learning models for flood susceptibility prediction mapping in the Middle Ganga Plain, India. Science of the Total Environment, 750(August). https://doi.org/10.1016/j.scitotenv.2020.141565

Cumel, David Zamri, Rahmaddeni, S. (2022). Perbandingan Metode Data Mining untuk Prediksi Banjir Dengan Algoritma Naïve Bayes dan KNN. SENTIMAS: Seminar Nasional Penelitian Dan, 40–48. https://journal.irpi.or.id/index.php/sentimas/article/view/353%0Ahttps://journal.irpi.or.id/index.php/sentimas/article/download/353/132

Daniel, I., Hartono, H., & Situmorang, Z. (2023). Analysis of Machine Learning Algorithms in Predicting the Flood Status of Jakarta City. International Conference on Information Science and Technology Innovation (ICoSTEC), 2(1), 82–87. https://doi.org/10.35842/icostec.v2i1.42

Databoks. (2023). BNPB: Tren Banjir di Indonesia Cenderung Menurun dalam Tiga Tahun Terakhir. https://databoks.katadata.co.id/datapublish/2023/02/20/bnpb-tren-banjir-di-indonesia-cenderung-menurun-dalam-tiga-tahun-terakhir

Dwiasnati, S., & Yudo Devianto. (2022). Optimization of Flood Prediction using SVM Algorithm to determine Flood Prone Areas. Journal of Systems Engineering and Information Technology (JOSEIT), 1(2), 40–46. https://doi.org/10.29207/joseit.v1i2.1995

Ernawati, R., Dirdjo, M. M., & Wahyuni, M. (2021). Peningkatan Pengetahuan Siswa Terhadap Mitigasi Bencana di SD Muhammadiyah 4 Samarinda. Journal of Community Engagement in 4(2), 393–399. https://jceh.org/index.php/JCEH/article/view/258

Evitasari, Y. D., Pranoto, W. J., & Verdikha, N. A. (2023). Evaluasi Support Vector Machine Dengan Optimasi Metode Genetic Algorithm Pada Klasifikasi Banjir Kota Samarinda. Jurnal Sains Komputer Dan Teknologi Informasi, 6(1), 49–53. https://doi.org/10.33084/jsakti.v6i1.5462

Faldi, F., NurHalisha, T., Pranoto, W. J., & ... (2023). The application of particle swarm optimization (PSO) to improve the accuracy of the naive bayes algorithm in predicting floods in the city of Samarinda. Journal of Intelligent …, 6(3), 138–146. http://idss.iocspublisher.org/index.php/jidss/article/view/148%0Ahttps://idss.iocspublisher.org/index.php/jidss/article/download/148/99

Gauhar, N., Das, S., & Moury, K. S. (2021). Prediction of Flood in Bangladesh using k-Nearest Neighbors Algorithm. International Conference on Robotics, Electrical and Signal Processing Techniques, 357–361. https://doi.org/10.1109/ICREST51555.2021.9331199

Hossain, M. S., & Zeyad, M. (2023). Prediction of Flood in Bangladesh Using Different Classifier Model. AIUB Journal of Science and Engineering, 22(1), 45–52. https://doi.org/10.53799/ajse.v22i1.365

Intan, S., & Sari, P. (2023). Analisis Pengaruh Gain Ratio Untuk Algoritma K-Nearest Neighbor Pada Klasifikasi Data Banjir Di Kota Samarinda Analysis Of The Effect Of Gain Ratio For Algorithms K-Nearest Neighbor On Classsification Flood Data In Samarinda City. Jurnal Sains Komputer Dan, 6(1), 54–59. https://journal.umpr.ac.id/index.php/jsakti/article/view/5472%0Ahttps://journal.umpr.ac.id/index.php/jsakti/article/download/5472/3664

Kemal Musthafa Rajabi, Witanti, W., & Rezki Yuniarti. (2023). Penerapan Algoritma K-Nearest Neighbor (KNN) Dengan Fitur Relief-F Dalam Penentuan Status Stunting. INNOVATIVE: Journal Of Social Science Research, 3, 3555–3568.

Nabila, S. P., Ulinnuha, N., Yusuf, A., Informasi, S., Wonosari, J., & Timur, J. (2021). Model Prediksi Kelulusan Tepat Waktu Dengan Metode Fuzzy C-Means Dan K-Nearest Neighbors. 6(1), 39–47.

Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350. https://doi.org/10.11591/ijeecs.v18.i3.pp1342-1350

Nursyahfitri, R., Rozikin, C., & Adam, R. I. (2022). Penerapan Metode SMOTE dalam Klasifikasi Daerah Rawan Banjir di Karawang Menggunakan Algoritma Naive Bayes. Jurnal Sistem Dan Teknologi Informasi (JustIN), 10(4), 339. https://doi.org/10.26418/justin.v10i4.46935

Priscillia, S., Schillaci, C., & Lipani, A. (2022). Arti fi cial Intelligence in Geosciences Flood susceptibility assessment using arti fi cial neural networks in Indonesia. Artificial Intelligence in Geosciences, 2(April), 215–222.

Purwanto, P. (2020). Analisis Sistem Pengendalian Banjir Sungai Pampang Daerah Aliran Hulu Sungai Karangmumus. Jurnal Kacapuri : Jurnal Keilmuan Teknik Sipil, 3(2), 44. https://doi.org/10.31602/jk.v3i2.4066

Razali, N., Ismail, S., & Mustapha, A. (2020). Machine learning approach for flood risks prediction. IAES International Journal of Artificial Intelligence, 9(1), 73–80. https://doi.org/10.11591/ijai.v9.i1.pp73-80

Tarasova, L., Merz, R., Kiss, A., Basso, S., Blöschl, G., Merz, B., Viglione, A., Plötner, S., Guse, B., Schumann, A., Fischer, S., Ahrens, B., Anwar, F., Bárdossy, A., Bühler, P., Haberlandt, U., Kreibich, H., Krug, A., Lun, D., Wietzke, L. (2019). Causative classification of river flood events. Wiley Interdisciplinary Reviews: Water, 6(4), 1–23. https://doi.org/10.1002/wat2.1353

Tarigan, P. M. S., Hardinata, J. T., Qurniawan, H., Safii, M., & Winanjaya, R. (2022). Implementasi Data Mining Menggunakan Algoritma Apriori Dalam Menentukan Persediaan Barang. Jurnal Janitra Informatika Dan Sistem Informasi, 2(1), 9–19. https://doi.org/10.25008/janitra.v2i1.142

Vafakhah, M., Mohammad Hasani Loor, S., Pourghasemi, H., & Katebikord, A. (2020). Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arabian Journal of Geosciences, 13(11), 1–16. https://doi.org/10.1007/s12517-020-05363-1

Yahdin, S., Desiani, A., Gofar, N., & Agustin, K. (2021). Application of the Relief-f Algorithm for Feature Selection in the Prediction of the Relevance Education Background with the Graduate Employment of the Universitas Sriwijaya. Computer Engineering and Applications Journal, 10(2), 71–80. https://doi.org/10.18495/comengapp.v10i2.369

Yoga Siswa, T. A. (2023). Data Mining: Mengupas Tuntas Analisis Data Dengan Metode Klasifikasi Hingga Deployment Aplikasi Menggunakan Python (T. A. Yoga Siswa (ed.)). UMKT PRESS.

Yusra, R. N., Sitompul, O. S., & Sawaluddin. (2021). Kombinasi K-Nearest Neighbor (KNN) dan Relief-F Untuk Meningkatkan Akurasi Pada Klasifikasi Data. InfoTekJar: Jurnal Nasional Informatika Dan Teknologi Jaringan, 1, 0–5.

Published

2024-07-31

How to Cite

Restu, A. K. A., Siswa, T. A. Y., & Pranoto, W. J. (2024). Model Optimasi KNN-PSORF dalam Menangani High Dimensional Data Banjir Kota Samarinda. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 1289–1299. https://doi.org/10.32493/jtsi.v7i3.41587