Analisis Klasifikasi Gambar Deteksi Merokok dengan Metode CNN yang Ditingkatkan Menggunakan Model Fine Tuning pada Arsitektur MobileNetV3L, EfficientNetV2M, dan Vision Transformer

Nuriyadin

Authors

Nuriyadin Program Studi Magister Teknik Informatika, Universitas Pamulang, Tangerang Selatan, Banten

Keywords:

detection, smoking, MobileNetV3L, EfficientNetV2M, Vision Transformer

Abstract

Smoke detection faces challenges in detecting small and common events such as smoking, using deep learning techniques. Issues like these have resulted in unsatisfactory privacy and accuracy models. In the EfficientNetV2M model, the author first uses data augmentation to increase the amount and diversity of training data by carrying out transformations of existing data. A lower learning rate allows smoother parameter updates and can improve the final performance of the model, fine-tune the EfficientNetV2M Layer, and Find the Ideal Learning Rate with the LearningRateScheduler Callback. The improved performance in terms of accuracy and robustness shows that this method can be used in related fields and represents significant progress in the field of burn detection with an accuracy rate of up to 97%. In the MobileNetV3L model, the author obtained lower resource usage results, namely with an accuracy rate of 87%. In the Vision Transformer model, the author uses a custom ViT (Vision Transformer) model for the feature extraction stage, then applies PCA for dimensional problems, and finally uses the XGBoost model for the classification stage and gets very satisfying results, namely with an accuracy level of 96 %. Future efforts will focus on improving this technology and finding ways to use it in broader contexts.

References

[1] A. Khan, S. Khan, B. Hassan, and Z. Zheng, “CNN-Based Smoker Classification and Detection in Smart City Application,” Sensors, vol. 22, no. 3, Feb. 2022, doi: 10.3390/s22030892.

[2] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” 2018.

[3] E. Eka Citra, D. Hatta Fudholi, and C. Kusuma Dewa, “JURNAL MEDIA INFORMATIKA BUDIDARMA Implementasi Arsitektur EfficientNetV2 Untuk Klasifikasi Gambar Makanan Tradisional Indonesia,” 2023, doi: 10.30865/mib.v7i2.5881.

[4] G. Marques, D. Agarwal, and I. de la Torre Díez, “Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network,” Applied Soft Computing Journal, vol. 96, Nov. 2020, doi: 10.1016/j.asoc.2020.106691.

[5] A. F. Anavyanto, M. Maimunah, M. R. A. Yudianto, and P. Sukmasetya, “EfficientNetV2M for Image Classification of Tomato Leaf Deseases,” PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic, vol. 11, no. 1, pp. 55–76, Mar. 2023, doi: 10.33558/piksel.v11i1.5925.

[6] S.-J. Lee et al., “Early detection of tongue cancer using a convolutional neural network and evaluation of the effectiveness of EcientNet,” 2022, doi: 10.21203/rs.3.rs-1628071/v1.

[7] Z. Wang, L. Lei, and P. Shi, “Smoking behavior detection algorithm based on YOLOv8-MNC,” Front Comput Neurosci, vol. 17, 2023, doi: 10.3389/fncom.2023.1243779.

[8] B. Wang, “Automatic Mushroom Species Classification Model for Foodborne Disease Prevention Based on Vision Transformer,” J Food Qual, vol. 2022, 2022, doi: 10.1155/2022/1173102.

[9] L. Papa, P. Russo, I. Amerini, and L. Zhou, “A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking,” IEEE Trans Pattern Anal Mach Intell, 2024, doi: 10.1109/TPAMI.2024.3392941.

[10] M. Raichura, N. Chothani, and D. Patel, “Efficient CNN-XGBoost technique for classification of power transformer internal faults against various abnormal conditions,” IET Generation, Transmission and Distribution, vol. 15, no. 5, pp. 972–985, Mar. 2021, doi: 10.1049/gtd2.12073.

[11] N. Lin, J. Fu, R. Jiang, G. Li, and Q. Yang, “Lithological Classification by Hyperspectral Images Based on a Two-Layer XGBoost Model, Combined with a Greedy Algorithm,” Remote Sens (Basel), vol. 15, no. 15, Aug. 2023, doi: 10.3390/rs15153764.

[12] R. Hussein, S. Lee, and R. Ward, “Multi-Channel Vision Transformer for Epileptic Seizure Prediction,” Biomedicines, vol. 10, no. 7, Jul. 2022, doi: 10.3390/biomedicines10071551.

[13] Y. Habchi et al., “Machine Learning and Vision Transformers for Thyroid Carcinoma Diagnosis: A review,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.13843

[14] I. A. Esha, “Multiclass Emotion Classification by using Spectrogram Image Analysis: A CNN-XGBoost Fusion Approach,” 2023.

[15] M. Tan and Q. V. Le, “EfficientNetV2: Smaller Models and Faster Training,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.00298

[16] H. Zhang et al., “ResNeSt: Split-Attention Networks,” 2020.

[17] A. Howard et al., “Searching for MobileNetV3,” 2019.

[18] T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of Tricks for Image Classification with Convolutional Neural Networks,” Dec. 2018, [Online]. Available: http://arxiv.org/abs/1812.01187

[19] B. Graham et al., “LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.01136

[20] M. Ding, B. Xiao, N. Codella, P. Luo, J. Wang, and L. Yuan, “DaViT: Dual Attention Vision Transformers,” 2022. [Online]. Available: https://github.com/microsoft/DaViT.

[21] W. Xu, Y. Xu, T. Chang, and Z. Tu, “Co-Scale Conv-Attentional Image Transformers,” 2021. [Online]. Available: https://github.com/mlpc-ucsd/CoaT.

[22] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929

[23] A. Khan, S. Khan, B. Hassan, R. Khan, and Z. Zheng, “SmokerViT: A Transformer-Based Method for Smoker Recognition,” Computers, Materials and Continua, vol. 77, no. 1, pp. 403–424, 2023, doi: 10.32604/cmc.2023.040251.

[24] I. Mudzakir and T. Arifin, “Klasifikasi Penggunaan Masker dengan Convolutional Neural Network Menggunakan Arsitektur MobileNetv2,” EXPERT: Jurnal Manajemen Sistem Informasi dan Teknologi, vol. 12, no. 1, p. 76, Jun. 2022, doi: 10.36448/expert.v12i1.2466.

[25] M. Ichwan and S. Hadi, “MIND (Multimedia Artificial Intelligent Networking Database Kinerja Model EfficientNetV2M dalam Klasifikasi Citra Tutupan dan Penggunaan Lahan,” Journal MIND Journal | ISSN, vol. 8, no. 2, pp. 203–216, 2023, doi: 10.26760/mindjournal.v8i2.203-216.

Analisis Klasifikasi Gambar Deteksi Merokok dengan Metode CNN yang Ditingkatkan Menggunakan Model Fine Tuning pada Arsitektur MobileNetV3L, EfficientNetV2M, dan Vision Transformer

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

template

Information