A Survey on Phishing Website Detection Using Hadoop
DOI:
https://doi.org/10.32493/informatika.v5i3.6672Keywords:
Phishing, Hadoop, Website, Information Security, Phishing DetectionAbstract
Phishing is an activity carried out by phishers with the aim of stealing personal data of internet users such as user IDs, password, and banking account, that data will be used for their personal interests. Average internet user will be easily trapped by phishers due to the similarity of the websites they visit to the original websites. Because there are several attributes that must be considered, most of internet user finds it difficult to distinguish between an authentic website or not. There are many ways to detecting a phishing website, but the existing phishing website detection system is too time-consuming and very dependent on the database it has. In this research, the focus of Hadoop MapReduce is to quickly retrieve some of the attributes of a phishing website that has an important role in identifying a phishing website, and then informing to users whether the website is a phishing website or not.
References
Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based Associative Classification data mining. Expert Systems with Applications, 41(13), 5948–5959. https://doi.org/10.1016/j.eswa.2014.03.019
Abdeljaber, F., Mohammad, R. M., Thabtah, F., & McCluskey, L. (2013). Predicting Phishing Websites using Neural Network trained with BackÂPropagation. 682–686.
Aburrous, M., Hossain, M. A., Dahal, K., & Thabatah, F. (2009). Modelling intelligent phishing detection system for e-banking using Fuzzy Data Mining. 2009 International Conference on CyberWorlds, CW ’09, 265–272. https://doi.org/10.1109/CW.2009.43
Anti-Phishing Working Group. (2019). Phishing activity trends report, 3rd quarter 2019 phishing.
Babagoli, M., Aghababa, M. P., & Solouk, V. (2019). Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, 23(12), 4315–4327. https://doi.org/10.1007/s00500-018-3084-2
Baitule, P. D., & Deshpande, S. P. (2014). A Survey On Efficient Anti Phishing Method Based on Visual Cryptography Using Cloud Technique By Smart Phones. 2014, 11–15.
Baykara, M., & Gürel, Z. Z. (2018). Detection of phishing attacks. 6th International Symposium on Digital Forensic and Security, ISDFS 2018 - Proceeding, 2018-Janua, 1–5. https://doi.org/10.1109/ISDFS.2018.8355389
Blum, A., Wardman, B., Solorio, T., & Warner, G. (2010). Lexical feature based phishing URL detection using online learning. Proceedings of the ACM Conference on Computer and Communications Security, 54–60. https://doi.org/10.1145/1866423.1866434
Chiew, K. L., Chang, E. H., Sze, S. N., & Tiong, W. K. (2015). Utilisation of website logo for phishing detection. Computers and Security, 54, 16–26. https://doi.org/10.1016/j.cose.2015.07.006
Curtis, S. R., Rajivan, P., Jones, D. N., & Gonzalez, C. (2018). Phishing attempts among the dark triad: Patterns of attack and vulnerability. In Computers in Human Behavior (Vol. 87). https://doi.org/10.1016/j.chb.2018.05.037
Dobolyi, D. G., & Abbasi, A. (2016). PhishMonger: A free and open source public archive of real-world phishing websites. IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016, 31–36. https://doi.org/10.1109/ISI.2016.7745439
Dunlop, M., Groat, S., & Shelly, D. (2010). GoldPhish: Using images for content-based phishing analysis. 5th International Conference on Internet Monitoring and Protection, ICIMP 2010, 123–128. https://doi.org/10.1109/ICIMP.2010.24
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. Q. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing, 0(0), 1–15. https://doi.org/10.1007/s12652-018-0786-3
Goel, D., & Jain, A. K. (2018). Mobile phishing attacks and defence mechanisms: State of art and open research challenges. Computers and Security, 73, 519–544. https://doi.org/10.1016/j.cose.2017.12.006
Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers and Security, 40, 23–37. https://doi.org/10.1016/j.cose.2013.10.004
Greene, K., Steves, M., & Theofanos, M. (2018). No phishing beyond this point. Computer, 51(6), 86–89. https://doi.org/10.1109/MC.2018.2701632
Gupta, B. B., Arachchilage, N. A. G., & Psannis, K. E. (2018). Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommunication Systems, 67(2), 247–267. https://doi.org/10.1007/s11235-017-0334-z
Gutierrez, C. N., Kim, T., Corte, R. Della, Avery, J., Goldwasser, D., Cinque, M., & Bagchi, S. (2018). Learning from the ones that got away: Detecting new forms of phishing attacks. IEEE Transactions on Dependable and Secure Computing, 15(6), 988–1001. https://doi.org/10.1109/TDSC.2018.2864993
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.
Hong, J. (2012). The state of phishing attacks. Communications of the ACM, 55(1), 74–81. https://doi.org/10.1145/2063176.2063197
Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. Eurasip Journal on Information Security, 2016(1). https://doi.org/10.1186/s13635-016-0034-3
James, J., Sandhya, L., & Thomas, C. (2013). Detection of phishing URLs using machine learning techniques. 2013 International Conference on Control Communication and Computing, ICCC 2013, (Iccc), 304–309. https://doi.org/10.1109/ICCC.2013.6731669
Kumar, V., & Kumar, R. (2015). Detection of phishing attack using visual cryptography in ad hoc network. 2015 International Conference on Communication and Signal Processing, ICCSP 2015, 1021–1025. https://doi.org/10.1109/ICCSP.2015.7322654
Mahajan, R., & Siddavatam, I. (2018). Phishing Website Detection using Machine Learning Algorithms. International Journal of Computer Applications, 181(23), 45–47. https://doi.org/10.5120/ijca2018918026
Mao, J., Tian, W., Li, P., Wei, T., & Liang, Z. (2017). Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity. IEEE Access, 5, 17020–17030. https://doi.org/10.1109/ACCESS.2017.2743528
Marchal, S., Francois, J., State, R., & Engel, T. (2014). PhishScore: Hacking phishers’ minds. Proceedings of the 10th International Conference on Network and Service Management, CNSM 2014, 46–54. https://doi.org/10.1109/CNSM.2014.7014140
Mohammad, R. M., Thabtah, F., & McCluskey, L. (2015). Tutorial and critical analysis of phishing websites methods. Computer Science Review, 17, 1–24. https://doi.org/10.1016/j.cosrev.2015.04.001
Pham, C., Nguyen, L. A. T., Tran, N. H., Huh, E. N., & Hong, C. S. (2018). Phishing-Aware: A Neuro-Fuzzy Approach for Anti-Phishing on Fog Networks. IEEE Transactions on Network and Service Management, 15(3), 1076–1089. https://doi.org/10.1109/TNSM.2018.2831197
Pujara, E. P., & Chaudhari, M. B. (2018). Phishing Website Detection using Machine Learning?: A Review. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 3(7), 395–399.
Qabajeh, I., Thabtah, F., & Chiclana, F. (2018). A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Computer Science Review, 29, 44–55. https://doi.org/10.1016/j.cosrev.2018.05.003
Rajab, M. (2018). An anti-phishing method based on feature analysis. ACM International Conference Proceeding Series, 133–139. https://doi.org/10.1145/3184066.3184082
Rakshith, K. R., & Prabhakara, B. K. (2016). Phishing Detection using Map-reduce and PART Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, 5(8), 492–494. https://doi.org/10.17148/IJARCCE.2016.58101
Ramesh, G., Gupta, J., & Gamya, P. G. (2017). Identification of phishing webpages and its target domains by analyzing the feign relationship. Journal of Information Security and Applications, 35, 75–84. https://doi.org/10.1016/j.jisa.2017.06.001
Rao, R. S., & Pais, A. R. (2017). Detecting phishing websites using automation of human behavior. CPSS 2017 - Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, Co-Located with ASIA CCS 2017, 33–42. https://doi.org/10.1145/3055186.3055188
Rao, R. S., & Pais, A. R. (2019). Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications, 31(8), 3851–3873. https://doi.org/10.1007/s00521-017-3305-0
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
Satish, S., & K, S. B. (2013). Phishing Websites Detection Based on Web Source Code and URL in the Webpage. International Journal of Computer Science and Engineering Communications, 1(1), 1–5. https://doi.org/10.5281/zenodo.821732
Shaikh, A. N., Shabut, A. M., & Hossain, M. A. (2017). A literature review on phishing crime, prevention review and investigation of gaps. SKIMA 2016 - 2016 10th International Conference on Software, Knowledge, Information Management and Applications, 9–15. https://doi.org/10.1109/SKIMA.2016.7916190
Sunil, A. N. V., & Sardana, A. (2012). A PageRank based detection technique for phishing web sites. 2012 IEEE Symposium on Computers & Informatics (ISCI), 58–63. https://doi.org/10.1109/ISCI.2012.6222667
Tan, C. L., Chiew, K. L., Wong, K. S., & Sze, S. N. (2016). PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, 88, 18–27. https://doi.org/10.1016/j.dss.2016.05.005
Tangy, Y., Uz, L. H., Caiy, Y., Mamoulisy, N., & Chengy, R. (2013). Earth mover’s distance based similarity search at scale. Proceedings of the VLDB Endowment, 7(4), 313–324. https://doi.org/10.14778/2732240.2732249
Thabtah, F., & Kamalov, F. (2017). Phishing Detection: A Case Analysis on Classifiers with Rules Using Machine Learning. Journal of Information and Knowledge Management, 16(4), 1–16. https://doi.org/10.1142/S0219649217500344
Verma, R., & Dyer, K. (2015). On the Character of Phishing URLs. 111–122. https://doi.org/10.1145/2699026.2699115
Volkamer, M., Renaud, K., Reinheimer, B., & Kunz, A. (2017). User experiences of TORPEDO: TOoltip-poweRed Phishing Email DetectiOn. Computers and Security, 71, 100–113. https://doi.org/10.1016/j.cose.2017.02.004
Wenyin, L., Fang, N., Quan, X., Qiu, B., & Liu, G. (2010). Discovering phishing target based on semantic link network. Future Generation Computer Systems, 26(3), 381–388. https://doi.org/10.1016/j.future.2009.07.012
Zhang, K., & Chen, X. W. (2014). Large-scale deep belief nets with mapreduce. IEEE Access, 2, 395–403. https://doi.org/10.1109/ACCESS.2014.2319813
Zhu, E., Chen, Y., Ye, C., Li, X., & Liu, F. (2019). OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural Network. IEEE Access, 7, 73271–73284. https://doi.org/10.1109/ACCESS.2019.2920655
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms