Optimasi SVM dengan RFE dan ROS untuk Mengatasi High Dimension dan Imbalanced Data Banjir

Authors

  • Faldy Alfareza Pambudi Universitas Muhammadiyah Kalimantan Timur
  • Taghfirul Azhima Yoga Siswa Muhammadiyah University of East Kalimantan
  • Wawan Joko Pranoto Muhammadiyah University of East Kalimantan

DOI:

https://doi.org/10.32493/jtsi.v7i3.41068

Keywords:

Flood Classification; SVM; RFE; ROS; Imbalanced Data; High Dimension

Abstract

Floods are natural disasters that often occur in Indonesia, one of which is the city of Samarinda which experienced a significant increase in flood cases in 2018-2021. The use of machine learning, especially the Support Vector Machine (SVM) algorithm, aims to accurately predict future flood events, but the main problem faced is data imbalance and high-dimensional data. This research combines SVM with Random Oversampling (ROS) oversampling techniques and Recursive Feature Elimination (RFE) feature selection to overcome data imbalance and high-dimensional data, with the aim of increasing the classification accuracy of Samarinda City flood data. The cross validation method is with 10-fold cross-validation, and the model performance is evaluated with a confusion matrix to calculate the accuracy value. The data used was obtained from BPDB and BMKG Samarinda City for the 2021-2023 period, consisting of 11 attributes and 1095 lines of data. The research results show that RFE succeeded in identifying the five most important features, namely minimum temperature (Tn), maximum temperature (Tx), average temperature (Tavg), humidity (RH_avg) and maximum wind direction (ddd_x). With the combination of SVM, ROS, and RFE models, flood data classification accuracy increased by 0.78% from 97.14% to 97.92%.

References

Ahmmed, M. R., Monir, J., & Khushbu, S. A. (2022). Analysis of Flood Risk Prediction Using Different Machine Learning Classifiers: A Study of Predicting Flood Risk in Rural Areas, Bangladesh. 2022 13th International Conference on Computing Communication and Networking Technologies, ICCCNT 2022, 1–6. https://doi.org/10.1109/ICCCNT54827.2022.9984449

Al-Mejibli, I. S., Alwan, J. K., & Abd, D. H. (2020). The effect of gamma value on support vector machine performance with different kernels. International Journal of Electrical and Computer Engineering, 10(5), 5497–5506. https://doi.org/10.11591/IJECE.V10I5.PP5497-5506

Andrean, F. W. (2024). 4.940 Bencana Terjadi di Indonesia Sepanjang 2023. https://indonesiabaik.id/infografis/4940-bencana-terjadi-di-indonesia-sepanjang-2023

Asrol, M., Papilo, P., & Gunawan, F. E. (2021). Support Vector Machine with K-fold Validation to Improve the Industry’s Sustainability Performance Classification. Procedia Computer Science, 179(2020), 854–862. https://doi.org/10.1016/j.procs.2021.01.074

BPS. (n.d.). Jumlah Desa/Kelurahan yang Mengalami Bencana Alam1 menurut Kecamatan di Kota Samarinda. Retrieved April 14, 2024, from https://samarindakota.bps.go.id/indicator/153/147/1/jumlah-desa-kelurahan-yang-mengalami-bencana-alam-sup-1-sup-menurut-kecamatan-di-kota-samarinda.html

Dilla Evitasari, Y., Pranoto, W. J., & Adzmi Verdikha, N. (2023). Evaluasi Support Vector Machine Dengan Optimasi Metode Genetic Algorithm Pada Klasifikasi Banjir Kota Samarinda Evaluation Support Vector Machine With Optimization Genetic Algorithm Method On Flood Classification In Samarinda. Jurnal Sains Komputer Dan Teknologi Informasi, 6(1), 49–53.

Duwal, S., Liu, D., & Pradhan, P. M. (2023). Flood susceptibility modeling of the Karnali river basin of Nepal using different machine learning approaches. Geomatics, Natural Hazards and Risk, 14(1). https://doi.org/10.1080/19475705.2023.2217321

Dwiasnati, S., & Devianto, Y. (2021). Optimasi Prediksi Bencana Banjir menggunakan Algoritma SVM untuk penentuan Daerah Rawan Bencana Banjir. Prosiding SISFOTEK, 202–207. http://seminar.iaii.or.id/index.php/SISFOTEK/article/view/283

Fauzi, A., Supriyadi, R., & Maulidah, N. (2020). Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest. Jurnal Infortech, 2(1), 96–101. https://doi.org/10.31294/infortech.v2i1.8079

Fitrianah, D., Gunawan, W., & Puspita Sari, A. (2022). Studi Komparasi Algoritma Klasifikasi C5.0, SVM dan Naive Bayes dengan Studi Kasus Prediksi Banjir Comparative Study of Classification Algorithm between C5.0, SVM and Naive Bayes with Case Study of Flood Prediction. Februari, 21(1), 1–11.

Gauhar, N., Das, S., & Moury, K. S. (2021). Prediction of Flood in Bangladesh using k-Nearest Neighbors Algorithm. International Conference on Robotics, Electrical and Signal Processing Techniques, January, 357–361. https://doi.org/10.1109/ICREST51555.2021.9331199

Guido, S. (2016). Introduction to Machine Learning with Python (D. Schanafelt (ed.); October 20). O’Reilly Media, Inc.

Gumelar, G., Ain, Q., Marsuciati, R., Agustanti Bambang, S., Sunyoto, A., & Syukri Mustafa, M. (2021). Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance. SISFOTEK : Sistem Informasi Dan Teknologi, 250–255.

Huang Kendrew, P. P. E. (2022). Support Vector Machine Algorithm. Binus. https://sis.binus.ac.id/2022/02/14/support-vector-machine-algorithm/

Idris, M., Adam, R. I., Brianorman, Y., Munir, R., & Mahayana, D. (2022). Kebenaran dalam Perspektif Filsafat Ilmu Pengetahuan dan Implementasi dalam Data Science dan Machine Leaning. Jurnal Filsafat Indonesia, 5(2), 173–181. https://doi.org/10.23887/jfi.v5i2.42207

Khan, T. A., Alam, M., Ahmed, S. F., Shahid, Z., & Mazliham, M. S. (2019). A Factual Flash Flood Evaluation using SVM and K-NN. ICETAS 2019 - 2019 6th IEEE International Conference on Engineering, Technologies and Applied Sciences. https://doi.org/10.1109/ICETAS48360.2019.9117424

Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A., Uddin, S., Luo, S., Yang, X., & Reyes, M. C. (2021). A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access, 9, 109960–109975. https://doi.org/10.1109/ACCESS.2021.3102399

Listanto, F., Fatchan, M., Hadikristanto, W., Studi, P., Informatika, T., Teknik, F., Pelita, U., & Bekasi, B. (2023). Prediksi Defect Produk Casting Dengan Algoritma SVM Berbasis RBF dan Linier. Jurnal Ilmiah Intech : Information Technology Journal of UMUS, 5(2), 109–119.

M. Adib Al Karomi, Abdul Kharis, I. (2019). Optimasi Algoritma Naive Bayes Dengan Information Gain Ratio Untuk Menangani Dataset Berdimensi Tinggi. Seminar Nasional Edusaintek, 37–43.

Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350. https://doi.org/10.11591/ijeecs.v18.i3.pp1342-1350

Pratama, A. R. I., Latipah, S. A., & Sari, B. N. (2022). Optimasi Klasifikasi Curah Hujan Menggunakan Support Vector Machine (Svm) Dan Recursive Feature Elimination (Rfe). JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 7(2), 314–324. https://doi.org/10.29100/jipi.v7i2.2675

Pratiwi, B. P. (2020). Pengukuran Kinerja Sistem Kualitas Udara Dengan Teknologi WSN Menggunakan Confusion Matrix. Jurnal Informatika UPGRIS, 6(2), 66–75.

Pristyanto, Y. (2019). Penerapan Metode Ensemble Untuk Meningkatkan Kinerja Algoritme Klasifikasi Pada Imbalanced Dataset. Jurnal Teknoinfo, 13(1), 11. https://doi.org/10.33365/jti.v13i1.184

Puspasari, R. L., Yoon, D., Kim, H., & Kim, K. W. (2023). Machine Learning for Flood Prediction in Indonesia: Providing Online Access for Disaster Management Control. Economic and Environmental Geology, 56(1), 65–73. https://doi.org/10.9719/eeg.2023.56.1.65

Rahman, M. A., Akter, A., Richi, F. S., Shoud, A., & Ahmed, T. (2023). A Comparative Study of Undersampling and Oversampling Methods for Flood Forecasting in Bangladesh using Machine Learning. 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023, December. https://doi.org/10.1109/ICCCNT56998.2023.10306368

Ramadhan, N. G., Khoirunnisa, A., Kurnianingsih, & Hashimoto, T. (2023). A Hybrid ROS-SVM Model for Detecting Target Multiple Drug Types. International Journal on Informatics Visualization, 7(3), 794–800. https://doi.org/10.30630/joiv.7.3.1171

Rifqi Fitriadi, & Deni Mahdiana. (2023). Systematic Literature Review of the Class Imbalance Challenges in Machine Learning. Jurnal Teknik Informatika (Jutif), 4(5), 1099–1107. https://doi.org/10.52436/1.jutif.2023.4.5.970

Rustam, Z., Syarifah, M. A., & Siswantining, T. (2019). Recursive Particle Swarm Optimization (RPSO) schemed Support Vector Machine (SVM) Implementation for Microarray Data Analysis on Chronic Kidney Disease (CKD). IOP Conference Series: Materials Science and Engineering, 546(5). https://doi.org/10.1088/1757-899X/546/5/052077

Sailasya, G., & Kumari, G. L. A. (2021). Analyzing the Performance of Stroke Prediction using ML Classification Algorithms. International Journal of Advanced Computer Science and Applications, 12(6), 539–545. https://doi.org/10.14569/IJACSA.2021.0120662

Salsadilla, V., Permana, I., Jazman, M., & Afdal, M. (2023). Determining the Final Project Topic Based on the Courses Taken by Using Machine Learning Techniques. 3(October), 188–198.

Sharma, P., Kar, B., Wang, J., & Bausch, D. (2021). A machine learning approach to flood severity classification and alerting. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, ARIC 2021, November, 42–47. https://doi.org/10.1145/3486626.3493432

Siswa, T. A. Y., & Wibowo, R. P. (2023). Komparasi Metode Seleksi Fitur Dalam Prediksi Keterlambatan Pembayaran Biaya Kuliah. Teknika, 12(1), 73–82. https://doi.org/10.34148/teknika.v12i1.601

Tarasova, L., Merz, R., Kiss, A., Basso, S., Blöschl, G., Merz, B., Viglione, A., Plötner, S., Guse, B., Schumann, A., Fischer, S., Ahrens, B., Anwar, F., Bárdossy, A., Bühler, P., Haberlandt, U., Kreibich, H., Krug, A., Lun, D., … Wietzke, L. (2019). Causative classification of river flood events. Wiley Interdisciplinary Reviews: Water, 6(4), 1–23. https://doi.org/10.1002/wat2.1353

Thakkar, A., & Lohiya, R. (2021). Attack classification using feature selection techniques: a comparative study. Journal of Ambient Intelligence and Humanized Computing, 12(1), 1249–1266. https://doi.org/10.1007/s12652-020-02167-9

Uddin, M. J., Ahamad, M. M., Hoque, M. N., Walid, M. A. A., Aktar, S., Alotaibi, N., Alyami, S. A., Kabir, M. A., & Moni, M. A. (2023). A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh. Information (Switzerland), 14(7), 1–19. https://doi.org/10.3390/info14070376

Yoga Siswa T.A. (2023). Data Mining: Mengupas Tuntas Analisis Data Dengan Metode Klasifikasi Hingga Deployment Aplikasi Menggunakan Python. Umkt Press : Universitas Muhammadiyah Kalimantan Timur.

Published

2024-07-31

How to Cite

Pambudi, F. A., Siswa, T. A. Y., & Pranoto, W. J. (2024). Optimasi SVM dengan RFE dan ROS untuk Mengatasi High Dimension dan Imbalanced Data Banjir. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 7(3), 1194–1203. https://doi.org/10.32493/jtsi.v7i3.41068