A Survey on Phishing Website Detection Using Hadoop

Muhammad Rayhan Natadimadja, Maman Abdurohman, Hilal Hudan Nuha


Phishing is an activity carried out by phishers with the aim of stealing personal data of internet users such as user IDs, password, and banking account, that data will be used for their personal interests. Average internet user will be easily trapped by phishers due to the similarity of the websites they visit to the original websites. Because there are several attributes that must be considered, most of internet user finds it difficult to distinguish between an authentic website or not. There are many ways to detecting a phishing website, but the existing phishing website detection system is too time-consuming and very dependent on the database it has. In this research, the focus of Hadoop MapReduce is to quickly retrieve some of the attributes of a phishing website that has an important role in identifying a phishing website, and then informing to users whether the website is a phishing website or not.


Phishing; Hadoop; Website; Information Security; Phishing Detection

Full Text:



Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based Associative Classification data mining. Expert Systems with Applications, 41(13), 5948–5959. https://doi.org/10.1016/j.eswa.2014.03.019

Abdeljaber, F., Mohammad, R. M., Thabtah, F., & McCluskey, L. (2013). Predicting Phishing Websites using Neural Network trained with Back­Propagation. 682–686.

Aburrous, M., Hossain, M. A., Dahal, K., & Thabatah, F. (2009). Modelling intelligent phishing detection system for e-banking using Fuzzy Data Mining. 2009 International Conference on CyberWorlds, CW ’09, 265–272. https://doi.org/10.1109/CW.2009.43

Anti-Phishing Working Group. (2019). Phishing activity trends report, 3rd quarter 2019 phishing.

Babagoli, M., Aghababa, M. P., & Solouk, V. (2019). Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, 23(12), 4315–4327. https://doi.org/10.1007/s00500-018-3084-2

Baitule, P. D., & Deshpande, S. P. (2014). A Survey On Efficient Anti Phishing Method Based on Visual Cryptography Using Cloud Technique By Smart Phones. 2014, 11–15.

Baykara, M., & Gürel, Z. Z. (2018). Detection of phishing attacks. 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018-Janua, 1–5. https://doi.org/10.1109/ISDFS.2018.8355389

Blum, A., Wardman, B., Solorio, T., & Warner, G. (2010). Lexical feature based phishing URL detection using online learning. Proceedings of the ACM Conference on Computer and Communications Security, 54–60. https://doi.org/10.1145/1866423.1866434

Chiew, K. L., Chang, E. H., Sze, S. N., & Tiong, W. K. (2015). Utilisation of website logo for phishing detection. Computers and Security, 54, 16–26. https://doi.org/10.1016/j.cose.2015.07.006

Curtis, S. R., Rajivan, P., Jones, D. N., & Gonzalez, C. (2018). Phishing attempts among the dark triad: Patterns of attack and vulnerability. In Computers in Human Behavior (Vol. 87). https://doi.org/10.1016/j.chb.2018.05.037

Dobolyi, D. G., & Abbasi, A. (2016). PhishMonger: A free and open source public archive of real-world phishing websites. IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016, 31–36. https://doi.org/10.1109/ISI.2016.7745439

Dunlop, M., Groat, S., & Shelly, D. (2010). GoldPhish: Using images for content-based phishing analysis. 5th International Conference on Internet Monitoring and Protection, ICIMP 2010, 123–128. https://doi.org/10.1109/ICIMP.2010.24

Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. Q. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing, 0(0), 1–15. https://doi.org/10.1007/s12652-018-0786-3

Goel, D., & Jain, A. K. (2018). Mobile phishing attacks and defence mechanisms: State of art and open research challenges. Computers and Security, 73, 519–544. https://doi.org/10.1016/j.cose.2017.12.006

Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers and Security, 40, 23–37. https://doi.org/10.1016/j.cose.2013.10.004

Greene, K., Steves, M., & Theofanos, M. (2018). No phishing beyond this point. Computer, 51(6), 86–89. https://doi.org/10.1109/MC.2018.2701632

Gupta, B. B., Arachchilage, N. A. G., & Psannis, K. E. (2018). Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommunication Systems, 67(2), 247–267. https://doi.org/10.1007/s11235-017-0334-z

Gutierrez, C. N., Kim, T., Corte, R. Della, Avery, J., Goldwasser, D., Cinque, M., & Bagchi, S. (2018). Learning from the ones that got away: Detecting new forms of phishing attacks. IEEE Transactions on Dependable and Secure Computing, 15(6), 988–1001. https://doi.org/10.1109/TDSC.2018.2864993

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.

Hong, J. (2012). The state of phishing attacks. Communications of the ACM, 55(1), 74–81. https://doi.org/10.1145/2063176.2063197

Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. Eurasip Journal on Information Security, 2016(1). https://doi.org/10.1186/s13635-016-0034-3

James, J., Sandhya, L., & Thomas, C. (2013). Detection of phishing URLs using machine learning techniques. 2013 International Conference on Control Communication and Computing, ICCC 2013, (Iccc), 304–309. https://doi.org/10.1109/ICCC.2013.6731669

Kumar, V., & Kumar, R. (2015). Detection of phishing attack using visual cryptography in ad hoc network. 2015 International Conference on Communication and Signal Processing, ICCSP 2015, 1021–1025. https://doi.org/10.1109/ICCSP.2015.7322654

Mahajan, R., & Siddavatam, I. (2018). Phishing Website Detection using Machine Learning Algorithms. International Journal of Computer Applications, 181(23), 45–47. https://doi.org/10.5120/ijca2018918026

Mao, J., Tian, W., Li, P., Wei, T., & Liang, Z. (2017). Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity. IEEE Access, 5, 17020–17030. https://doi.org/10.1109/ACCESS.2017.2743528

Marchal, S., Francois, J., State, R., & Engel, T. (2014). PhishScore: Hacking phishers’ minds. Proceedings of the 10th International Conference on Network and Service Management, CNSM 2014, 46–54. https://doi.org/10.1109/CNSM.2014.7014140

Mohammad, R. M., Thabtah, F., & McCluskey, L. (2015). Tutorial and critical analysis of phishing websites methods. Computer Science Review, 17, 1–24. https://doi.org/10.1016/j.cosrev.2015.04.001

Pham, C., Nguyen, L. A. T., Tran, N. H., Huh, E. N., & Hong, C. S. (2018). Phishing-Aware: A Neuro-Fuzzy Approach for Anti-Phishing on Fog Networks. IEEE Transactions on Network and Service Management, 15(3), 1076–1089. https://doi.org/10.1109/TNSM.2018.2831197

Pujara, E. P., & Chaudhari, M. B. (2018). Phishing Website Detection using Machine Learning?: A Review. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 3(7), 395–399.

Qabajeh, I., Thabtah, F., & Chiclana, F. (2018). A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Computer Science Review, 29, 44–55. https://doi.org/10.1016/j.cosrev.2018.05.003

Rajab, M. (2018). An anti-phishing method based on feature analysis. ACM International Conference Proceeding Series, 133–139. https://doi.org/10.1145/3184066.3184082

Rakshith, K. R., & Prabhakara, B. K. (2016). Phishing Detection using Map-reduce and PART Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, 5(8), 492–494. https://doi.org/10.17148/IJARCCE.2016.58101

Ramesh, G., Gupta, J., & Gamya, P. G. (2017). Identification of phishing webpages and its target domains by analyzing the feign relationship. Journal of Information Security and Applications, 35, 75–84. https://doi.org/10.1016/j.jisa.2017.06.001

Rao, R. S., & Pais, A. R. (2017). Detecting phishing websites using automation of human behavior. CPSS 2017 - Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, Co-Located with ASIA CCS 2017, 33–42. https://doi.org/10.1145/3055186.3055188

Rao, R. S., & Pais, A. R. (2019). Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications, 31(8), 3851–3873. https://doi.org/10.1007/s00521-017-3305-0

Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029

Satish, S., & K, S. B. (2013). Phishing Websites Detection Based on Web Source Code and URL in the Webpage. International Journal of Computer Science and Engineering Communications, 1(1), 1–5. https://doi.org/10.5281/zenodo.821732

Shaikh, A. N., Shabut, A. M., & Hossain, M. A. (2017). A literature review on phishing crime, prevention review and investigation of gaps. SKIMA 2016 - 2016 10th International Conference on Software, Knowledge, Information Management and Applications, 9–15. https://doi.org/10.1109/SKIMA.2016.7916190

Sunil, A. N. V., & Sardana, A. (2012). A PageRank based detection technique for phishing web sites. 2012 IEEE Symposium on Computers & Informatics (ISCI), 58–63. https://doi.org/10.1109/ISCI.2012.6222667

Tan, C. L., Chiew, K. L., Wong, K. S., & Sze, S. N. (2016). PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, 88, 18–27. https://doi.org/10.1016/j.dss.2016.05.005

Tangy, Y., Uz, L. H., Caiy, Y., Mamoulisy, N., & Chengy, R. (2013). Earth mover’s distance based similarity search at scale. Proceedings of the VLDB Endowment, 7(4), 313–324. https://doi.org/10.14778/2732240.2732249

Thabtah, F., & Kamalov, F. (2017). Phishing Detection: A Case Analysis on Classifiers with Rules Using Machine Learning. Journal of Information and Knowledge Management, 16(4), 1–16. https://doi.org/10.1142/S0219649217500344

Verma, R., & Dyer, K. (2015). On the Character of Phishing URLs. 111–122. https://doi.org/10.1145/2699026.2699115

Volkamer, M., Renaud, K., Reinheimer, B., & Kunz, A. (2017). User experiences of TORPEDO: TOoltip-poweRed Phishing Email DetectiOn. Computers and Security, 71, 100–113. https://doi.org/10.1016/j.cose.2017.02.004

Wenyin, L., Fang, N., Quan, X., Qiu, B., & Liu, G. (2010). Discovering phishing target based on semantic link network. Future Generation Computer Systems, 26(3), 381–388. https://doi.org/10.1016/j.future.2009.07.012

Zhang, K., & Chen, X. W. (2014). Large-scale deep belief nets with mapreduce. IEEE Access, 2, 395–403. https://doi.org/10.1109/ACCESS.2014.2319813

Zhu, E., Chen, Y., Ye, C., Li, X., & Liu, F. (2019). OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural Network. IEEE Access, 7, 73271–73284. https://doi.org/10.1109/ACCESS.2019.2920655

DOI: http://dx.doi.org/10.32493/informatika.v5i3.6672


  • There are currently no refbacks.

Copyright (c) 2020 Muhammad Rayhan Natadimadja

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Jurnal Informatika Universitas Pamulang (ISSN: 2541-1004 e-ISSN: 2622-4615)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License