Analisis Visual dan Karakteristik Klub Sepakbola Liga Inggris Berdasarkan Pola Permainan Menggunakan K-Means Clustering

Authors

  • Rachmat Bintang Yudhianto IPB University
  • Fajar Athallah Yusuf IPB University
  • Anwar Fitrianto IPB University
  • L.M. Risman Dwi Jumansyah IPB University

DOI:

https://doi.org/10.32493/informatika.v9i3.44640

Keywords:

English Premier League, K-Means, Playing Characteristics, Football Analytics, Cluster Evaluation, Feature Engineering

Abstract

This research aimed to analyze and cluster football teams in the English Premier League (EPL) for the 2023/2024 season based on their playing characteristics using K-Means clustering. Understanding the playing styles is essential for optimizing strategies and enhancing team performance. Preprocessing steps included data cleaning, feature engineering, and visualization of key features such as goals, shots, and attacking attempts. Four clusters were identified using the Elbow method, representing teams with varying levels of attacking and defensive capabilities. Evaluation of the clustering results was conducted using Davies-Bouldin (score: 0.47), Calinski-Harabasz (score: 275.89), and Silhouette (score: 0.53) metrics, indicating moderate clustering quality. The findings suggest that EPL teams tend to be attack-oriented, while defensive strength varies across clusters. Limitations in the dataset, such as the number of observations and features, impacted the analysis, and future studies may benefit from incorporating additional features and advanced dimensionality reduction techniques.

References

Al-Asadi MA, Tasdemir S. 2022. Predict the Value of Football Players Using FIFA Video Game Data and Machine Learning Techniques. IEEE Access. 10:22631–22645.doi:10.1109/ACCESS.2022.3154767.

Andreff W. 2011. Some comparative economics of the organization of sports: competition and regulation in north American vs. European professional team sports leagues. The European Journal of Comparative Economics. 8(1):3–27.

Baboota R, Kaur H. 2019. Predictive analysis and modelling football results using machine learning approach for English Premier League. Int J Forecast. 35(2):741–755.doi:10.1016/j.ijforecast.2018.01.003.

Bond AJ, Widdop P, Cockayne D, Parnell D. 2021. Prosumption, Networks and Value during a Global Pandemic: Lockdown Leisure and COVID-19. Leis Sci. 43(1–2):70–77.doi:10.1080/01490400.2020.1773985.

Firman Ashari I, Dwi Nugroho E, Baraku R, Yanda IN, Liwardana R. 2023. Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta. Volume ke-7.

Foo WL, Tester E, Close GL, Cronin CJ, Morton JP. 2024. Professional Male Soccer Players’ Perspectives of the Nutrition Culture Within an English Premier League Football Club: A Qualitative Exploration Using Bourdieu’s Concepts of Habitus, Capital and Field. Sports Medicine..doi:10.1007/s40279-024-02134-w.

Herold M, Goes F, Nopp S, Bauer P, Thompson C, Meyer T. 2019. Machine learning in men’s professional football: Current applications and future directions for improving attacking play. Int J Sports Sci Coach. 14(6):798–817.doi:10.1177/1747954119879350.

Hewitt JH, Karakuş O. 2023. A machine learning approach for player and position adjusted expected goals in football (soccer). Franklin Open. 4:100034.doi:10.1016/j.fraope.2023.100034.

Kumar S, Solanki VK, Choudhary SK, Selamat A, Crespo RG. 2020. Comparative study on ant colony optimization (ACO) and k-means clustering approaches for jobs scheduling and energy optimization model in internet of things (IoT). International Journal of Interactive Multimedia and Artificial Intelligence. 6(1):107–116.doi:10.9781/ijimai.2020.01.003.

Millati K, Suhaeni C, Susetyo B. 2021. Penggerombolan Daerah 3T di Indonesia Berdasarkan Rasio Tenaga Kesehatan dengan Metode Penggerombolan Berhierarki dan Cluster Ensemble. Xplore: Journal of Statistics. 10(2):197–213.doi:10.29244/xplore.v10i2.744.

Murtagh F, Contreras P. 2012. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip Rev Data Min Knowl Discov. 2(1):86–97.doi:10.1002/widm.53.

Nargesian F, Samulowitz H, Khurana U, Khalil EB, Turaga D. 2017. Learning feature engineering for classification. Di dalam: IJCAI International Joint Conference on Artificial Intelligence. Vol. 0. International Joint Conferences on Artificial Intelligence. hlm. 2529–2535.

Pratama Simanjuntak K, Khaira U. 2021. MALCOM: Indonesian Journal of Machine Learning and Computer Science Hotspot Clustering in Jambi Province Using Agglomerative Hierarchical Clustering Algorithm Pengelompokkan Titik Api di Provinsi Jambi dengan Algoritma Agglomerative Hierarchical Clustering. 1:7–16.

Rommers N, Rössler R, Verhagen E, Vandecasteele F, Verstockt S, Vaeyens R, Lenoir M, D’Hondt E, Witvrouw E. 2020. A Machine Learning Approach to Assess Injury Risk in Elite Youth Football Players. Med Sci Sports Exerc. 52(8):1745–1751.doi:10.1249/MSS.0000000000002305.

Shi C, Wei B, Wei S, Wang W, Liu H, Liu J. 2021. A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. EURASIP J Wirel Commun Netw. 2021(1).doi:10.1186/s13638-021-01910-w.

Vergani AA, Binaghi E. 2018. A soft davies-bouldin separation measure. Di dalam: IEEE International Conference on Fuzzy Systems. Vol. 2018-July. Institute of Electrical and Electronics Engineers Inc.

Wu R. 2024. Behavioral analysis of electricity consumption characteristics for customer groups using the k-means algorithm. Systems and Soft Computing. 6.doi:10.1016/j.sasc.2024.200143.

Xu D, Tian Y. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science. 2(2):165–193.doi:10.1007/s40745-015-0040-1.

Downloads

Published

2024-09-30