Optimasi SVM dengan RFE dan ROS untuk Mengatasi High Dimension dan Imbalanced Data Banjir
DOI:
https://doi.org/10.32493/jtsi.v7i3.41068Keywords:
Flood Classification; SVM; RFE; ROS; Imbalanced Data; High DimensionAbstract
Floods are natural disasters that often occur in Indonesia, one of which is the city of Samarinda which experienced a significant increase in flood cases in 2018-2021. The use of machine learning, especially the Support Vector Machine (SVM) algorithm, aims to accurately predict future flood events, but the main problem faced is data imbalance and high-dimensional data. This research combines SVM with Random Oversampling (ROS) oversampling techniques and Recursive Feature Elimination (RFE) feature selection to overcome data imbalance and high-dimensional data, with the aim of increasing the classification accuracy of Samarinda City flood data. The cross validation method is with 10-fold cross-validation, and the model performance is evaluated with a confusion matrix to calculate the accuracy value. The data used was obtained from BPDB and BMKG Samarinda City for the 2021-2023 period, consisting of 11 attributes and 1095 lines of data. The research results show that RFE succeeded in identifying the five most important features, namely minimum temperature (Tn), maximum temperature (Tx), average temperature (Tavg), humidity (RH_avg) and maximum wind direction (ddd_x). With the combination of SVM, ROS, and RFE models, flood data classification accuracy increased by 0.78% from 97.14% to 97.92%.
References
Ahmmed, M. R., Monir, J., & Khushbu, S. A. (2022). Analysis of Flood Risk Prediction Using Different Machine Learning Classifiers: A Study of Predicting Flood Risk in Rural Areas, Bangladesh. 2022 13th International Conference on Computing Communication and Networking Technologies, ICCCNT 2022, 1–6. https://doi.org/10.1109/ICCCNT54827.2022.9984449
Al-Mejibli, I. S., Alwan, J. K., & Abd, D. H. (2020). The effect of gamma value on support vector machine performance with different kernels. International Journal of Electrical and Computer Engineering, 10(5), 5497–5506. https://doi.org/10.11591/IJECE.V10I5.PP5497-5506
Andrean, F. W. (2024). 4.940 Bencana Terjadi di Indonesia Sepanjang 2023. https://indonesiabaik.id/infografis/4940-bencana-terjadi-di-indonesia-sepanjang-2023
Asrol, M., Papilo, P., & Gunawan, F. E. (2021). Support Vector Machine with K-fold Validation to Improve the Industry’s Sustainability Performance Classification. Procedia Computer Science, 179(2020), 854–862. https://doi.org/10.1016/j.procs.2021.01.074
BPS. (n.d.). Jumlah Desa/Kelurahan yang Mengalami Bencana Alam1 menurut Kecamatan di Kota Samarinda. Retrieved April 14, 2024, from https://samarindakota.bps.go.id/indicator/153/147/1/jumlah-desa-kelurahan-yang-mengalami-bencana-alam-sup-1-sup-menurut-kecamatan-di-kota-samarinda.html
Dilla Evitasari, Y., Pranoto, W. J., & Adzmi Verdikha, N. (2023). Evaluasi Support Vector Machine Dengan Optimasi Metode Genetic Algorithm Pada Klasifikasi Banjir Kota Samarinda Evaluation Support Vector Machine With Optimization Genetic Algorithm Method On Flood Classification In Samarinda. Jurnal Sains Komputer Dan Teknologi Informasi, 6(1), 49–53.
Duwal, S., Liu, D., & Pradhan, P. M. (2023). Flood susceptibility modeling of the Karnali river basin of Nepal using different machine learning approaches. Geomatics, Natural Hazards and Risk, 14(1). https://doi.org/10.1080/19475705.2023.2217321
Dwiasnati, S., & Devianto, Y. (2021). Optimasi Prediksi Bencana Banjir menggunakan Algoritma SVM untuk penentuan Daerah Rawan Bencana Banjir. Prosiding SISFOTEK, 202–207. http://seminar.iaii.or.id/index.php/SISFOTEK/article/view/283
Fauzi, A., Supriyadi, R., & Maulidah, N. (2020). Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest. Jurnal Infortech, 2(1), 96–101. https://doi.org/10.31294/infortech.v2i1.8079
Fitrianah, D., Gunawan, W., & Puspita Sari, A. (2022). Studi Komparasi Algoritma Klasifikasi C5.0, SVM dan Naive Bayes dengan Studi Kasus Prediksi Banjir Comparative Study of Classification Algorithm between C5.0, SVM and Naive Bayes with Case Study of Flood Prediction. Februari, 21(1), 1–11.
Gauhar, N., Das, S., & Moury, K. S. (2021). Prediction of Flood in Bangladesh using k-Nearest Neighbors Algorithm. International Conference on Robotics, Electrical and Signal Processing Techniques, January, 357–361. https://doi.org/10.1109/ICREST51555.2021.9331199
Guido, S. (2016). Introduction to Machine Learning with Python (D. Schanafelt (ed.); October 20). O’Reilly Media, Inc.
Gumelar, G., Ain, Q., Marsuciati, R., Agustanti Bambang, S., Sunyoto, A., & Syukri Mustafa, M. (2021). Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance. SISFOTEK : Sistem Informasi Dan Teknologi, 250–255.
Huang Kendrew, P. P. E. (2022). Support Vector Machine Algorithm. Binus. https://sis.binus.ac.id/2022/02/14/support-vector-machine-algorithm/
Idris, M., Adam, R. I., Brianorman, Y., Munir, R., & Mahayana, D. (2022). Kebenaran dalam Perspektif Filsafat Ilmu Pengetahuan dan Implementasi dalam Data Science dan Machine Leaning. Jurnal Filsafat Indonesia, 5(2), 173–181. https://doi.org/10.23887/jfi.v5i2.42207
Khan, T. A., Alam, M., Ahmed, S. F., Shahid, Z., & Mazliham, M. S. (2019). A Factual Flash Flood Evaluation using SVM and K-NN. ICETAS 2019 - 2019 6th IEEE International Conference on Engineering, Technologies and Applied Sciences. https://doi.org/10.1109/ICETAS48360.2019.9117424
Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A., Uddin, S., Luo, S., Yang, X., & Reyes, M. C. (2021). A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access, 9, 109960–109975. https://doi.org/10.1109/ACCESS.2021.3102399
Listanto, F., Fatchan, M., Hadikristanto, W., Studi, P., Informatika, T., Teknik, F., Pelita, U., & Bekasi, B. (2023). Prediksi Defect Produk Casting Dengan Algoritma SVM Berbasis RBF dan Linier. Jurnal Ilmiah Intech : Information Technology Journal of UMUS, 5(2), 109–119.
M. Adib Al Karomi, Abdul Kharis, I. (2019). Optimasi Algoritma Naive Bayes Dengan Information Gain Ratio Untuk Menangani Dataset Berdimensi Tinggi. Seminar Nasional Edusaintek, 37–43.
Nawi, N. M., Makhtar, M., Salikon, M. Z., & Afip, Z. A. (2020). A comparative analysis of classification techniques on predicting flood risk. Indonesian Journal of Electrical Engineering and Computer Science, 18(3), 1342–1350. https://doi.org/10.11591/ijeecs.v18.i3.pp1342-1350
Pratama, A. R. I., Latipah, S. A., & Sari, B. N. (2022). Optimasi Klasifikasi Curah Hujan Menggunakan Support Vector Machine (Svm) Dan Recursive Feature Elimination (Rfe). JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 7(2), 314–324. https://doi.org/10.29100/jipi.v7i2.2675
Pratiwi, B. P. (2020). Pengukuran Kinerja Sistem Kualitas Udara Dengan Teknologi WSN Menggunakan Confusion Matrix. Jurnal Informatika UPGRIS, 6(2), 66–75.
Pristyanto, Y. (2019). Penerapan Metode Ensemble Untuk Meningkatkan Kinerja Algoritme Klasifikasi Pada Imbalanced Dataset. Jurnal Teknoinfo, 13(1), 11. https://doi.org/10.33365/jti.v13i1.184
Puspasari, R. L., Yoon, D., Kim, H., & Kim, K. W. (2023). Machine Learning for Flood Prediction in Indonesia: Providing Online Access for Disaster Management Control. Economic and Environmental Geology, 56(1), 65–73. https://doi.org/10.9719/eeg.2023.56.1.65
Rahman, M. A., Akter, A., Richi, F. S., Shoud, A., & Ahmed, T. (2023). A Comparative Study of Undersampling and Oversampling Methods for Flood Forecasting in Bangladesh using Machine Learning. 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023, December. https://doi.org/10.1109/ICCCNT56998.2023.10306368
Ramadhan, N. G., Khoirunnisa, A., Kurnianingsih, & Hashimoto, T. (2023). A Hybrid ROS-SVM Model for Detecting Target Multiple Drug Types. International Journal on Informatics Visualization, 7(3), 794–800. https://doi.org/10.30630/joiv.7.3.1171
Rifqi Fitriadi, & Deni Mahdiana. (2023). Systematic Literature Review of the Class Imbalance Challenges in Machine Learning. Jurnal Teknik Informatika (Jutif), 4(5), 1099–1107. https://doi.org/10.52436/1.jutif.2023.4.5.970
Rustam, Z., Syarifah, M. A., & Siswantining, T. (2019). Recursive Particle Swarm Optimization (RPSO) schemed Support Vector Machine (SVM) Implementation for Microarray Data Analysis on Chronic Kidney Disease (CKD). IOP Conference Series: Materials Science and Engineering, 546(5). https://doi.org/10.1088/1757-899X/546/5/052077
Sailasya, G., & Kumari, G. L. A. (2021). Analyzing the Performance of Stroke Prediction using ML Classification Algorithms. International Journal of Advanced Computer Science and Applications, 12(6), 539–545. https://doi.org/10.14569/IJACSA.2021.0120662
Salsadilla, V., Permana, I., Jazman, M., & Afdal, M. (2023). Determining the Final Project Topic Based on the Courses Taken by Using Machine Learning Techniques. 3(October), 188–198.
Sharma, P., Kar, B., Wang, J., & Bausch, D. (2021). A machine learning approach to flood severity classification and alerting. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, ARIC 2021, November, 42–47. https://doi.org/10.1145/3486626.3493432
Siswa, T. A. Y., & Wibowo, R. P. (2023). Komparasi Metode Seleksi Fitur Dalam Prediksi Keterlambatan Pembayaran Biaya Kuliah. Teknika, 12(1), 73–82. https://doi.org/10.34148/teknika.v12i1.601
Tarasova, L., Merz, R., Kiss, A., Basso, S., Blöschl, G., Merz, B., Viglione, A., Plötner, S., Guse, B., Schumann, A., Fischer, S., Ahrens, B., Anwar, F., Bárdossy, A., Bühler, P., Haberlandt, U., Kreibich, H., Krug, A., Lun, D., … Wietzke, L. (2019). Causative classification of river flood events. Wiley Interdisciplinary Reviews: Water, 6(4), 1–23. https://doi.org/10.1002/wat2.1353
Thakkar, A., & Lohiya, R. (2021). Attack classification using feature selection techniques: a comparative study. Journal of Ambient Intelligence and Humanized Computing, 12(1), 1249–1266. https://doi.org/10.1007/s12652-020-02167-9
Uddin, M. J., Ahamad, M. M., Hoque, M. N., Walid, M. A. A., Aktar, S., Alotaibi, N., Alyami, S. A., Kabir, M. A., & Moni, M. A. (2023). A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh. Information (Switzerland), 14(7), 1–19. https://doi.org/10.3390/info14070376
Yoga Siswa T.A. (2023). Data Mining: Mengupas Tuntas Analisis Data Dengan Metode Klasifikasi Hingga Deployment Aplikasi Menggunakan Python. Umkt Press : Universitas Muhammadiyah Kalimantan Timur.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Faldy Alfareza Pambudi, Taghfirul Azhima Yoga Siswa, Wawan Joko Pranoto

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Teknologi Sistem Informasi dan Aplikasi have CC BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Teknologi Sistem Informasi dan Aplikasi recognize that free access is better than priced access, libre access is better than free access, and libre under CC BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License
YOU ARE FREE TO:
- Share - copy and redistribute the material in any medium or format
- Adapt - remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms