KLASIFIKASI PHISHING URL PADA WEBSITE BERBASIS METODE ENSEMBLE

Authors

  • Bahrul Ulum Program Studi Teknik Informatika S-2, Universitas Pamulang
  • Taswanda Taryo Program Studi Teknik Informatika S-2, Universitas Pamulang
  • Sudarno Program Studi Teknik Informatika S-2, Universitas Pamulang

Keywords:

phishing URL, ensemble learning, CatBoost, XGBoost, LightGBM, security detection, cybersecurity

Abstract

This study analyzes the performance of ensemble learning algorithms in detecting phishing URLs using the PhiUSIIL Phishing URL dataset. The three algorithms compared are CatBoost, XGBoost, and LightGBM. The research stages include data preprocessing, data division into an 80:20 train-test split, and performance evaluation based on accuracy, precision, recall, and F1-score metrics. The results show that XGBoost has the best performance with an accuracy of 97.54% and an ROC AUC of 93.05%, followed by CatBoost with an accuracy of 97.46% and an ROC AUC of 92.94%. LightGBM, although it has lower performance, still shows good results with an accuracy of 96.99% and an ROC AUC of 91.85%. The data cleaning process successfully improves efficiency by eliminating irrelevant attribute analysis. This study confirms that ensemble algorithms can be implemented for the development of more effective and accurate phishing detection systems. XGBoost is recommended as the primary algorithm in detecting phishing threats in cybersecurity applications, thanks to its ability to handle large and complex data.

References

[1] M. Samantri and Afiyati, “Perbandingan Algoritma Support Vector Machine dan Random Forest untuk Analisis Sentimen Terhadap Kebijakan Pemerintah Indonesia Terkait Kenaikan Harga BBM Tahun 2022,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 8, no. 1, pp. 1–9, 2024, doi: 10.35870/jtik.v8i1.1202.

[2] I. A. Hidayat, “Classification of Sleep Disorders Using Random Forest on Sleep Health and Lifestyle Dataset,” J. Dinda Data Sci. Inf. Technol. Data Anal., vol. 3, no. 2, pp. 71–76, 2023, doi: 10.20895/dinda.v3i2.1215.

[3] A. F. Mahmud and S. Wirawan, “Sistemasi: Jurnal Sistem Informasi Deteksi Phishing Website menggunakan Machine Learning Metode Klasifikasi Phishing Website Detection using Machine Learning Classification Method,” vol. 13, no. 4, pp. 2540–9719, 2024, [Online]. Available: http://sistemasi.ftik.unisi.ac.id

[4] A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk Meningkatkan Kinerja Klasifikasi pada Proses Deteksi Web Phishing,” J. Infomedia, vol. 7, no. 1, p. 39, 2022, doi: 10.30811/jim.v7i1.2959.

[5] C. Umam and L. B. Handoko, “Prediksi Email Phising Menggunakan Support Vector Machine,” Semnas Ristek (Seminar Nas. Ris. dan Inov. Teknol., vol. 8, no. 01, pp. 85–89, 2024, doi: 10.30998/semnasristek.v8i01.7138.

[6] A. D. Harahap, D. Juardi, and A. S. Y. Irawan, “Rancang Bangun Sistem Pendeteksi Link Phishing Menggunakan Algoritma Random Forest Berbasis Web,” J. Inform. dan Tek. Elektro Terap., vol. 12, no. 3, 2024, doi: 10.23960/jitet.v12i3.4858.

[7] F. S. S. Nagalay, F. I. Komputer, S. Informasi, K. Siber, and W. Phishing, “ANALISIS PENERAPAN ALGORITMA DECISION TREE DALAM KEAMANAN SIBER UNTUK KELASIFIKASI SITUS WEBSITE,” vol. 10, no. 1, pp. 1–8, 2024.

[8] T. T. F. Manguma and E. Fatra, “Analisis Performa Algoritma Klasifikasi untuk Deteksi Spam pada Email,” Innov. J. Soc. Sci. …, vol. 4, pp. 16461–16465, 2024, [Online]. Available: http://j-innovative.org/index.php/Innovative/article/download/12547/8461

[9] C. M. Bachri and W. Gunawan, “JEPIN (Jurnal Edukasi dan Penelitian Informatika) Deteksi Email Spam menggunakan Algoritma Convolutional Neural Network (CNN),” Edukasi dan Penelit. Inform., vol. 10, no. 1, pp. 88–94, 2024.

[10] P. Subarkah and A. N. Ikhsan, “Identifikasi Website Phishing Menggunakan Algoritma Classification And Regression Trees (CART),” J. Ilm. Inform., vol. 6, no. 2, pp. 127–136, 2021, doi: 10.35316/jimi.v6i2.1342.

Downloads

Published

2025-07-31