Penerapan Resampling dan Adaboost untuk Penanganan Masalah Ketidakseimbangan Kelas Berbasis NaÏŠve Bayes pada Prediksi Churn Pelanggan

Authors

  • Sri Mulyati Universitas Pamulang
  • Yulianti Yulianti Universitas Pamulang
  • Aries Saifudin Universitas Pamulang

DOI:

https://doi.org/10.32493/informatika.v2i4.1440

Keywords:

AdaBoot, Churn Pelanggan, Naïve Bayes, Prediksi, Resampling

Abstract

Banyaknya operator seluler mendorong persaingan usaha yang sangat ketat. Kemudahan pelanggan untuk berpindah ke pesaing merupakan perhatian utama bagi bagian CRM (Customer Relationship Management), karena untuk mendapatkan pelanggan baru membutuhkan biaya yang jauh lebih mahal daripada mempertahankan pelanggan yang sudah ada. Untuk mengambil tindakan yang tepat dalam mempertahankan pelanggan harus mengetahui kecenderungan pelanggan apakah akan mengalami churn atau tidak. Prediksi kecenderungan pelanggan dilakukan dengan menggunakan model data mining. Pada penelitian ini akan diterapkan teknik resampling dan teknik ensemble AdaBoost untuk memperbaiki kinerja pengklasifikasi sedangkan untuk mengukur kinerja model digunakan software RapidMiner. Hasil penelitian menunjukkan bahwa model integrasi random oversampling, AdaBoost, dan Naïve Bayes memiliki kinerja yang lebih baik karena memiliki nilai AUC (Area Under the ROC (Receiver Operating Characteristic) Curve) yang lebih baik.

References

Afza; A. J.; Farid; D. M.; & Rahman; C. M. (2011). A Hybrid Classifier using Boosting; Clustering; and Naïve Bayesian Classifier. World of Computer Science and Information Technology Journal (WCSIT); 105-109.

Catal; C. (2012). Performance Evaluation Metrics for Software Fault Prediction Studies. Acta Polytechnica Hungarica; 9(4); 193-206.

Chawla; N. V.; Bowyer; K. W.; Hall; L. O.; & Kegelmeyer; W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research; 321–357.

Chen; Z.-Y.; Fan; Z.-P.; & Sun; M. (2012). A Hierarchical Multiple Kernel Support Vector Machine for Customer Churn Prediction Using Longitudinal Behavioral Data. European Journal of Operational Research; 223(2); 461-472. doi:10.1016/j.ejor.2012.06.040

Churi; A.; Divekar; M.; Dashpute; S.; & Kamble; P. (2015). Analysis of Customer Churn in Mobile Industry using Data Mining. International Journal of Emerging Technology and Advanced Engineering; 5(3); 225-230. Retrieved from www.ijetae.com/files/Volume5Issue3/IJETAE_0315_41.pdf

Harrington; P. (2012). Machine Learning in Action. New York: Manning Publications Co.

Huang; B.; Kechadi; M. T.; & Buckley; B. (2012). Customer Churn Prediction in Telecommunications. Expert Systems with Applications; 1414-1425.

Jadhav; R. J.; & Pawar; U. T. (2011). Churn Prediction in Telecommunication Using Data Mining Technology. (IJACSA) International Journal of Advanced Computer Science and Applications; 2(2); 17-19.

Keramati; A.; Jafari-Marandi; R.; Aliannejadi; M.; Ahmadian; I.; Mozzafari; M.; & Abbasi; U. (2014). Improved Churn Prediction in Telecommunication Industry Using Data Mining Techniques. Applied Soft Computing; 24; 994–1012. doi:10.1016/j.asoc.2014.08.041

Korada; N. K.; Kumar; N. P.; & Deekshitulu; Y. (2012). Implementation of Naïve Bayesian Classifier and Ada-Boost Algorithm Using Maize Expert System. International Journal of Information Sciences and Techniques (IJIST) Vol.2; No.3; 63-75.

Korb; K. B.; & Nicholson; A. E. (2011). Bayesian Artificial Intelligence (2nd ed.). Florida: CRC Press.

Lu; J. (2002). Predicting Customer Churn in the Telecommunications Industry - An Application of Survival Analysis Modeling Using SAS. Proceedings of the Twenty-Seventh Annual SAS® Users Group International Conference (pp. 1-6). Orlando: SAS Institute Inc. Retrieved from http://www2.sas.com/proceedings/sugi27/p114-27.pdf

Nistanto; R. K. (2014; Juni 4). Tekno: 2015; Pengguna "Mobile" Lampaui Jumlah Penduduk Dunia. Retrieved from kompas.com: http://tekno.kompas.com/read/2014/06/04/1025003/2015.pengguna.mobile.lampaui.jumlah.penduduk.dunia

Peng; Y.; & Yao; J. (2010). AdaOUBoost: Adaptive Over-sampling and Under-sampling to Boost the Concept Learning in Large Scale Imbalanced Data Sets. Proceedings of the international conference on Multimedia information retrieval (pp. 111-118). Philadelphia; Pennsylvania; USA: ACM.

Rodan; A.; Fayyoumi; A.; Faris; H.; Alsakran; J.; & Al-Kadi; O. (2015). Negative Correlation Learning for Customer Churn Prediction: A Comparison Study. The Scientific World Journal; 1-7. doi:10.1155/2015/473283

Sanou; B. (2015). ICT Facts & Figures. Geneva: International Telecommunication Union.

Sun; Y.; Mohamed; K. S.; Wong; A. K.; & Wang; Y. (2007). Cost-sensitive Boosting for Classification of Imbalanced Data. Pattern Recognition Society; 3358-3378.

Umayaparvathi; V.; & Iyakutti; K. (2012). Applications of Data Mining Techniques in Telecom Churn Prediction. International Journal of Computer Applications; 42(20); 5-9. doi:10.5120/5814-8122

Verbeke; W.; Dejaeger; K.; Martens; D.; Hur; J.; & Baesens; B. (2012). New Insights into Churn Prediction in the Telecommunication Sector: A Profit Driven Data Mining Approach. European Journal of Operational Research; 218(1); 211-229. doi:10.1016/j.ejor.2011.09.031

Witten; I. H.; Frank; E.; & Hall; M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Burlington: Morgan Kaufmann.

Yap; B. W.; Rani; K. A.; Rahman; H. A.; Fong; S.; Khairudin; Z.; & Abdullah; N. N. (2014). An Application of Oversampling; Undersampling; Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). 285; pp. 13-22. Singapore: Springer. doi:10.1007/978-981-4585-18-7_2

Yu; X.; Guo; S.; Guo; J.; & Huang; X. (2010). An Extended Support Vector Machine Forecasting Framework for Customer Churn in E-Commerce. Expert Systems with Applications; 38(3); 1425-1430. doi:10.1016/j.eswa.2010.07.049

Zaki; M. J.; & Jr; W. M. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. New York: Cambridge University Press.

Zhang; D.; Liu; W.; Gong; X.; & Jin; H. (2011). A Novel Improved SMOTE Resampling Algorithm Based on Fractal. Computational Information Systems; 2204-2211.

Zhang; H.; Jiang; L.; & Su; J. (2005). Augmenting Na?ve Bayes for Ranking. ICML '05 Proceedings of the 22nd international conference on Machine learning (pp. 1020 - 1027). New York: ACM Press. doi:http://dx.doi.org/10.1145/1102351.1102480

Zhou; Z.-H.; & Yu; Y. (2009). The Top Ten Algorithms in Data Mining. (X. Wu; & V. Kumar; Eds.) Florida: Chapman & Hall/CRC.

Downloads

Published

2017-12-25