Evaluasi Modern Model Pembelajaran Mesin pada Dataset SEERA untuk Estimasi Upaya Perangkat Lunak
DOI:
https://doi.org/10.32493/jiup.v10i2.51687Keywords:
Software Effort Estimation;, Machine Learning; , Random Forest; , K-Fold; , SEERA Dataset;Abstract
Estimating software development effort is crucial in project planning and management, especially in resource-constrained environments. This study piloted four modern regression models: Random Forest, Support Vector Machine (SVM), Lasso Regression, and Ridge Regression, chosen because they represent different approaches: ensemble, margin-based, and L1 and L2 regularization. Experiments were conducted using the SEERA (Software Effort Estimation with Real Attributes) dataset, consisting of 99 entries, with a modern Python pipeline including preprocessing, feature selection, Z-score normalization, data splitting (80:20), and cross-validation (5-Fold Cross Validation). Models were evaluated using MAE, RMSE, and R². Results showed that Random Forest outperformed both the 80:20 split (R² = 0.740, MAE = 3981.53) and K-Fold (R² = 0.715, MAE = 3152.03), while SVM performed the worst with a negative R². Lasso and Ridge are only competitive at 80:20 but significantly decrease on K-Fold, indicating less stability. This research contributes by providing an in-depth evaluation based on a single dataset and demonstrating that the transparent Python pipeline based on K-Fold can be replicated to improve estimation accuracy. Future research could be conducted using advanced ensemble methods (e.g., XGBoost) and evaluated on larger datasets to generalize the results.
References
Alauthman, M., al-Qerem, A., Alangari, S., Ali, A. M., Nabo, A., Aldweesh, A., Jebreen, I., Almomani, A., & Gupta, B. B. (2023). Machine Learning for Accurate Software Development Cost Estimation in Economically and Technically Limited Environments. International Journal of Software Science and Computational Intelligence, 15(1), 1–24. https://doi.org/10.4018/ijssci.331753
Bajusova, D., Silhavy, P., & Silhavy, R. (2024). Enhancing Software Effort Estimation with Self-Organizing Migration Algorithm: A Comparative Analysis of COCOMO Models. IEEE Access, 12(April), 67170–67188. https://doi.org/10.1109/ACCESS.2024.3399060
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623
De Carvalho, H. D. P., Fagundes, R., & Santos, W. (2021). Extreme Learning Machine Applied to Software Development Effort Estimation. IEEE Access, 9, 92676–92687. https://doi.org/10.1109/ACCESS.2021.3091313
Kim, B. S., Lee, S. H., Lee, Y. R., Park, Y. H., & Jeong, J. (2022). Design and Implementation of Cloud Docker Application Architecture Based on Machine Learning in Container Management for Smart Manufacturing. Applied Sciences (Switzerland), 12(13), 1–17. https://doi.org/10.3390/app12136737
Latif, A., Fitriana, L. A., & Firdaus, M. R. (2021). Comparative Analysis of Software Effort Estimation Using Data Mining Technique and Feature Selection. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 6(2), 167–174. https://doi.org/10.33480/jitk.v6i2.1968
Miller, C., Portlock, T., Nyaga, D. M., & O’Sullivan, J. M. (2024). A review of model evaluation metrics for machine learning in genetics and genomics. Frontiers in Bioinformatics, 4(September), 1–13. https://doi.org/10.3389/fbinf.2024.1457619
Nevendra & Singh. (2022). A Survey of Software Defect Prediction Based on Deep Learning. Springer Nature Link, 29, 5723–5748, (2022).
Nhung, H. L. T. K., Van Hai, V., Silhavy, R., Prokopova, Z., & Silhavy, P. (2022). Parametric Software Effort Estimation Based on Optimizing Correction Factors and Multiple Linear Regression. IEEE Access, 10, 2963–2986. https://doi.org/10.1109/ACCESS.2021.3139183
Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214(February), 1–7. https://doi.org/10.1016/j.cmpb.2021.106584
Puspaningrum, A., Muhammad, F. P. B., & Mulyani, E. (2021). Flower Pollination Algorithm for Software Effort Coefficients Optimization to Improve Effort Estimation Accuracy. JUITA: Jurnal Informatika, 9(2), 139. https://doi.org/10.30595/juita.v9i2.10511
Sivakumar, M., Parthasarathy, S., & Padmapriya, T. (2024). Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Computer Science, 10, e2245. https://doi.org/10.7717/peerj-cs.2245
Zakaria, N. A., Ismail, A. R., Abidin, N. Z., Khalid, N. H. M., & Ali, A. Y. (2021). Optimization of COCOMO model using particle swarm optimization. International Journal of Advances in Intelligent Informatics, 7(2), 177–187. https://doi.org/10.26555/ijain.v7i2.583
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2025 Fina Sifaul Nufus, Agus Subekti

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms
