Penerapan Resampling dan Adaboost untuk Penanganan Masalah Ketidakseimbangan Kelas Berbasis NaÏŠve Bayes pada Prediksi Churn Pelanggan
DOI:
https://doi.org/10.32493/informatika.v2i4.1440Keywords:
AdaBoot, Churn Pelanggan, Naïve Bayes, Prediksi, ResamplingAbstract
Banyaknya operator seluler mendorong persaingan usaha yang sangat ketat. Kemudahan pelanggan untuk berpindah ke pesaing merupakan perhatian utama bagi bagian CRM (Customer Relationship Management), karena untuk mendapatkan pelanggan baru membutuhkan biaya yang jauh lebih mahal daripada mempertahankan pelanggan yang sudah ada. Untuk mengambil tindakan yang tepat dalam mempertahankan pelanggan harus mengetahui kecenderungan pelanggan apakah akan mengalami churn atau tidak. Prediksi kecenderungan pelanggan dilakukan dengan menggunakan model data mining. Pada penelitian ini akan diterapkan teknik resampling dan teknik ensemble AdaBoost untuk memperbaiki kinerja pengklasifikasi sedangkan untuk mengukur kinerja model digunakan software RapidMiner. Hasil penelitian menunjukkan bahwa model integrasi random oversampling, AdaBoost, dan Naïve Bayes memiliki kinerja yang lebih baik karena memiliki nilai AUC (Area Under the ROC (Receiver Operating Characteristic) Curve) yang lebih baik.References
Afza; A. J.; Farid; D. M.; & Rahman; C. M. (2011). A Hybrid Classifier using Boosting; Clustering; and Naïve Bayesian Classifier. World of Computer Science and Information Technology Journal (WCSIT); 105-109.
Catal; C. (2012). Performance Evaluation Metrics for Software Fault Prediction Studies. Acta Polytechnica Hungarica; 9(4); 193-206.
Chawla; N. V.; Bowyer; K. W.; Hall; L. O.; & Kegelmeyer; W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research; 321–357.
Chen; Z.-Y.; Fan; Z.-P.; & Sun; M. (2012). A Hierarchical Multiple Kernel Support Vector Machine for Customer Churn Prediction Using Longitudinal Behavioral Data. European Journal of Operational Research; 223(2); 461-472. doi:10.1016/j.ejor.2012.06.040
Churi; A.; Divekar; M.; Dashpute; S.; & Kamble; P. (2015). Analysis of Customer Churn in Mobile Industry using Data Mining. International Journal of Emerging Technology and Advanced Engineering; 5(3); 225-230. Retrieved from www.ijetae.com/files/Volume5Issue3/IJETAE_0315_41.pdf
Harrington; P. (2012). Machine Learning in Action. New York: Manning Publications Co.
Huang; B.; Kechadi; M. T.; & Buckley; B. (2012). Customer Churn Prediction in Telecommunications. Expert Systems with Applications; 1414-1425.
Jadhav; R. J.; & Pawar; U. T. (2011). Churn Prediction in Telecommunication Using Data Mining Technology. (IJACSA) International Journal of Advanced Computer Science and Applications; 2(2); 17-19.
Keramati; A.; Jafari-Marandi; R.; Aliannejadi; M.; Ahmadian; I.; Mozzafari; M.; & Abbasi; U. (2014). Improved Churn Prediction in Telecommunication Industry Using Data Mining Techniques. Applied Soft Computing; 24; 994–1012. doi:10.1016/j.asoc.2014.08.041
Korada; N. K.; Kumar; N. P.; & Deekshitulu; Y. (2012). Implementation of Naïve Bayesian Classifier and Ada-Boost Algorithm Using Maize Expert System. International Journal of Information Sciences and Techniques (IJIST) Vol.2; No.3; 63-75.
Korb; K. B.; & Nicholson; A. E. (2011). Bayesian Artificial Intelligence (2nd ed.). Florida: CRC Press.
Lu; J. (2002). Predicting Customer Churn in the Telecommunications Industry - An Application of Survival Analysis Modeling Using SAS. Proceedings of the Twenty-Seventh Annual SAS® Users Group International Conference (pp. 1-6). Orlando: SAS Institute Inc. Retrieved from http://www2.sas.com/proceedings/sugi27/p114-27.pdf
Nistanto; R. K. (2014; Juni 4). Tekno: 2015; Pengguna "Mobile" Lampaui Jumlah Penduduk Dunia. Retrieved from kompas.com: http://tekno.kompas.com/read/2014/06/04/1025003/2015.pengguna.mobile.lampaui.jumlah.penduduk.dunia
Peng; Y.; & Yao; J. (2010). AdaOUBoost: Adaptive Over-sampling and Under-sampling to Boost the Concept Learning in Large Scale Imbalanced Data Sets. Proceedings of the international conference on Multimedia information retrieval (pp. 111-118). Philadelphia; Pennsylvania; USA: ACM.
Rodan; A.; Fayyoumi; A.; Faris; H.; Alsakran; J.; & Al-Kadi; O. (2015). Negative Correlation Learning for Customer Churn Prediction: A Comparison Study. The Scientific World Journal; 1-7. doi:10.1155/2015/473283
Sanou; B. (2015). ICT Facts & Figures. Geneva: International Telecommunication Union.
Sun; Y.; Mohamed; K. S.; Wong; A. K.; & Wang; Y. (2007). Cost-sensitive Boosting for Classification of Imbalanced Data. Pattern Recognition Society; 3358-3378.
Umayaparvathi; V.; & Iyakutti; K. (2012). Applications of Data Mining Techniques in Telecom Churn Prediction. International Journal of Computer Applications; 42(20); 5-9. doi:10.5120/5814-8122
Verbeke; W.; Dejaeger; K.; Martens; D.; Hur; J.; & Baesens; B. (2012). New Insights into Churn Prediction in the Telecommunication Sector: A Profit Driven Data Mining Approach. European Journal of Operational Research; 218(1); 211-229. doi:10.1016/j.ejor.2011.09.031
Witten; I. H.; Frank; E.; & Hall; M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Burlington: Morgan Kaufmann.
Yap; B. W.; Rani; K. A.; Rahman; H. A.; Fong; S.; Khairudin; Z.; & Abdullah; N. N. (2014). An Application of Oversampling; Undersampling; Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). 285; pp. 13-22. Singapore: Springer. doi:10.1007/978-981-4585-18-7_2
Yu; X.; Guo; S.; Guo; J.; & Huang; X. (2010). An Extended Support Vector Machine Forecasting Framework for Customer Churn in E-Commerce. Expert Systems with Applications; 38(3); 1425-1430. doi:10.1016/j.eswa.2010.07.049
Zaki; M. J.; & Jr; W. M. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. New York: Cambridge University Press.
Zhang; D.; Liu; W.; Gong; X.; & Jin; H. (2011). A Novel Improved SMOTE Resampling Algorithm Based on Fractal. Computational Information Systems; 2204-2211.
Zhang; H.; Jiang; L.; & Su; J. (2005). Augmenting Na?ve Bayes for Ranking. ICML '05 Proceedings of the 22nd international conference on Machine learning (pp. 1020 - 1027). New York: ACM Press. doi:http://dx.doi.org/10.1145/1102351.1102480
Zhou; Z.-H.; & Yu; Y. (2009). The Top Ten Algorithms in Data Mining. (X. Wu; & V. Kumar; Eds.) Florida: Chapman & Hall/CRC.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms