Comparative Study on Regression Algorithms for Predicting Price of Online Course: Udemy Case Study




Comparative Study, Machine Learning, Price Prediction, Regression


Talent in the field of information technology is much needed. However, studying in the field of information technology requires a sizable fee. Online courses are a cost-effective option for learning. Online course sites like Udemy provide and sell hundreds of thousands of courses and have thousands of trusted instructors. With so many Udemy instructors, prices vary widely because the course pricing system is completely set by the teaching instructor. This means that the selling price of the course is not affected by the quality of the course, so not all courses are recommended to be purchased. To overcome this problem, a system is needed that can predict course prices so that it can advise instructors in determining selling prices. To compare the best algorithms used to create this system, three algorithms are used in this study: multiple linear regression, polynomial regression, and K-Nearest Neighbors Regression. The researcher uses 1200 data sample from web scraping results from the Udemy site, with one test for each algorithm. As a result, the K-Nearest Neighbors Regression got the best evaluation results with a root mean squared error value of 231659.49, a mean absolute percentage error of 0.43, and a coefficient of determination of 0.18.

Author Biographies

Maximus Aurelius Wiranata, Universitas Ciputra Surabaya

Informatics, School of Information Technology

Theresia Ratih Dewi Saputri, Universitas Ciputra Surabaya

Informatics, School of Information Technology


Anscombe, F. J. (1973). Graphs in statistical analysis. The american statistician, 27(1), 17-21.


Behera, J., Pasayat, A. K., Behera, H., & Kumar, P. (2023). Prediction based mean-value-at-risk portfolio optimization using machine learning regression algorithms for multi-national stock markets. Engineering Applications of Artificial Intelligence, 120, 105843.

Botchkarev, A. (2018). Evaluating performance of regression machine learning models using multiple error metrics in azure machine learning studio. Available at SSRN 3177507.

Brownlee, J. (2020). Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.

Chen, W., Zhang, H., Mehlawat, M. K., & Jia, L. (2021). Mean–variance portfolio optimization using machine learning-based stock price prediction. Applied Soft Computing, 100, 106943.

Fafirudin, T., Fitriani, F., & Wulandari, A. (2021). Minat Mahasiswa Melanjutkan Kuliah: Intensitas Promosi, Kepercayaan dan Biaya Kuliah. Jurnal Pengembangan Wiraswasta, 23(3), 185-192.

Fauzia, F., Virantika, A., & Firmansyah, G. (2021). Langkah langkah Strategis Pemenuhan Kebutuhan SDM Talenta Digital di Lingkungan Pemerintahan Indonesia. Proceeding KONIK (Konferensi Nasional Ilmu Komputer), 5, 39-46.

Ginantra, N. L. W. S. R., & Anandita, I. B. G. (2019). Penerapan Metode Single Exponential Smoothing Dalam Peramalan Penjualan Barang. J-SAKTI (Jurnal Sains Komputer dan Informatika), 3(2), 433-441.

Hastomo, W., Karno, A. S. B., Kalbuana, N., Nisfiani, E., & Lussiana, E. T. P. (2021). Optimasi Deep Learning untuk Prediksi Saham di Masa Pandemi Covid-19. JEPIN (Jurnal Edukasi dan Penelitian Informatika), 7(2), 133-140.

Krisma, A., Azhari, M., & Widagdo, P. P. (2019, September). Perbandingan metode double exponential smoothing dan triple exponential smoothing dalam parameter tingkat error mean absolute percentage error (mape) dan means absolute deviation (mad). In Prosiding Seminar Nasional Ilmu Komputer dan Teknologi Informasi (Vol. 4, No. 2).

Kristen, U., Wacana, S., Tua, N., & Gaol, L. (2017). Magister Manajemen Pendidikan FKIP Teori dan Implementasi Gaya Kepemimpinan Kepala Sekolah. Ejournal. Uksw. Edu.

Leidiyana, H. (2013). Penerapan Algoritma KNN untuk Penentuan Resiko kredit Kepemilikan Kendaraan Bermotor. Jurnal Penelitian Ilmu Komputer Sistem Embedded dan Logic, 1(1), 65-76.

Madhuri, C. R., Anuradha, G., & Pujitha, M. V. (2019, March). House price prediction using regression techniques: A comparative study. In 2019 International conference on smart structures and systems (ICSSS) (pp. 1-5). IEEE.

Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691-692.

Nishom, M. (2019). Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square. Jurnal Informatika, 4(01), 20-24.

Osborne, J. W. (2000). Prediction in multiple regression. Practical Assessment, Research, and Evaluation, 7(1), 2.

Pane, S. F., Poetra, C. K., & Fatonah, R. N. S. (2021). Analisa Profit Dan Loss Pada Sistem Manajemen Aset Dengan Menggunakan Algoritma Multiple Linear Regression. Jurnal SITECH: Sistem Informasi dan Teknologi, 4(1), 1-6.

Rohman, M. A., & Harini, S. (2022). Komparasi Algoritma Naïve Bayes dan k-Nearest Neighbor Pada Klasifikasi Kontribusi Tokoh Politik. INFORMATION SYSTEM FOR EDUCATORS AND PROFESSIONALS: Journal of Information System, 7(1), 21-30.

Sumarno, S., Gimin, G., & Nas, S. (2017). Dampak Biaya Kuliah Tunggal Terhadap Kualitas Layanan Pendidikan. Kelola: Jurnal Manajemen Pendidikan, 4(2), 184-194.

Tranmer, M., & Elliot, M. (2008). Multiple linear regression. The Cathie Marsh Centre for Census and Survey Research (CCSR), 5(5), 1-5.

United Nations. (2020). UN  E-Government  Survey  2020.

Wiradinata, T., Graciella, F., Tanamal, R., Soekamto, Y. S., & Saputri, T. R. D. (2022). Post-Pandemic Analysis of House Price Prediction in Surabaya: A Machine Learning Approach. Journal of Southwest Jiaotong University, 57(5).


