Evaluasi Modern Model Pembelajaran Mesin pada Dataset SEERA untuk Estimasi Upaya Perangkat Lunak

Authors

  • Fina Sifaul Nufus Universitas Nusa Mandiri
  • Agus Subekti Universitas Nusa Mandiri

DOI:

https://doi.org/10.32493/jiup.v10i2.51687

Keywords:

Software Effort Estimation;, Machine Learning; , Random Forest; , K-Fold; , SEERA Dataset;

Abstract

Estimating software development effort is crucial in project planning and management, especially in resource-constrained environments. This study piloted four modern regression models: Random Forest, Support Vector Machine (SVM), Lasso Regression, and Ridge Regression, chosen because they represent different approaches: ensemble, margin-based, and L1 and L2 regularization. Experiments were conducted using the SEERA (Software Effort Estimation with Real Attributes) dataset, consisting of 99 entries, with a modern Python pipeline including preprocessing, feature selection, Z-score normalization, data splitting (80:20), and cross-validation (5-Fold Cross Validation). Models were evaluated using MAE, RMSE, and R². Results showed that Random Forest outperformed both the 80:20 split (R² = 0.740, MAE = 3981.53) and K-Fold (R² = 0.715, MAE = 3152.03), while SVM performed the worst with a negative R². Lasso and Ridge are only competitive at 80:20 but significantly decrease on K-Fold, indicating less stability. This research contributes by providing an in-depth evaluation based on a single dataset and demonstrating that the transparent Python pipeline based on K-Fold can be replicated to improve estimation accuracy. Future research could be conducted using advanced ensemble methods (e.g., XGBoost) and evaluated on larger datasets to generalize the results.

References

Alauthman, M., al-Qerem, A., Alangari, S., Ali, A. M., Nabo, A., Aldweesh, A., Jebreen, I., Almomani, A., & Gupta, B. B. (2023). Machine Learning for Accurate Software Development Cost Estimation in Economically and Technically Limited Environments. International Journal of Software Science and Computational Intelligence, 15(1), 1–24. https://doi.org/10.4018/ijssci.331753

Bajusova, D., Silhavy, P., & Silhavy, R. (2024). Enhancing Software Effort Estimation with Self-Organizing Migration Algorithm: A Comparative Analysis of COCOMO Models. IEEE Access, 12(April), 67170–67188. https://doi.org/10.1109/ACCESS.2024.3399060

Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623

De Carvalho, H. D. P., Fagundes, R., & Santos, W. (2021). Extreme Learning Machine Applied to Software Development Effort Estimation. IEEE Access, 9, 92676–92687. https://doi.org/10.1109/ACCESS.2021.3091313

Kim, B. S., Lee, S. H., Lee, Y. R., Park, Y. H., & Jeong, J. (2022). Design and Implementation of Cloud Docker Application Architecture Based on Machine Learning in Container Management for Smart Manufacturing. Applied Sciences (Switzerland), 12(13), 1–17. https://doi.org/10.3390/app12136737

Latif, A., Fitriana, L. A., & Firdaus, M. R. (2021). Comparative Analysis of Software Effort Estimation Using Data Mining Technique and Feature Selection. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 6(2), 167–174. https://doi.org/10.33480/jitk.v6i2.1968

Miller, C., Portlock, T., Nyaga, D. M., & O’Sullivan, J. M. (2024). A review of model evaluation metrics for machine learning in genetics and genomics. Frontiers in Bioinformatics, 4(September), 1–13. https://doi.org/10.3389/fbinf.2024.1457619

Nevendra & Singh. (2022). A Survey of Software Defect Prediction Based on Deep Learning. Springer Nature Link, 29, 5723–5748, (2022).

Nhung, H. L. T. K., Van Hai, V., Silhavy, R., Prokopova, Z., & Silhavy, P. (2022). Parametric Software Effort Estimation Based on Optimizing Correction Factors and Multiple Linear Regression. IEEE Access, 10, 2963–2986. https://doi.org/10.1109/ACCESS.2021.3139183

Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214(February), 1–7. https://doi.org/10.1016/j.cmpb.2021.106584

Puspaningrum, A., Muhammad, F. P. B., & Mulyani, E. (2021). Flower Pollination Algorithm for Software Effort Coefficients Optimization to Improve Effort Estimation Accuracy. JUITA: Jurnal Informatika, 9(2), 139. https://doi.org/10.30595/juita.v9i2.10511

Sivakumar, M., Parthasarathy, S., & Padmapriya, T. (2024). Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Computer Science, 10, e2245. https://doi.org/10.7717/peerj-cs.2245

Zakaria, N. A., Ismail, A. R., Abidin, N. Z., Khalid, N. H. M., & Ali, A. Y. (2021). Optimization of COCOMO model using particle swarm optimization. International Journal of Advances in Intelligent Informatics, 7(2), 177–187. https://doi.org/10.26555/ijain.v7i2.583

Downloads

Published

2025-06-30