Klasifikasi Emosi Berdasarkan Suara dengan Metode Convolutional Neural Network

Authors

  • Muhammad Elio Phillo Rismanto Universitas Teknologi Yogyakarta
  • Irma Handayani Universitas Teknologi Yogyakarta

DOI:

https://doi.org/10.32493/informatika.v9i4.45236

Keywords:

CNN, SER, RAVDESS

Abstract

Voice-based emotion detection technology (SER), is the study of machines' ability to comprehend patterns in voice data, utilizing a range of methods and features. However, its utilizations remains limited due to the inherent challenges faced by machines in accurately discerning emotions. This research was conducted using a frequently used method, namely CNN and was developed to produce a high-accuracy method, with spectrogram features due to their capacity to record frequencies in RAVDESS. The data set comprised 2068 voice samples classified into five emotion classes: angry, afraid, happy, sad, and neutral. The augmentation of all data regarding noise, pitch, shifting, stretching, and high and low speed, was implemented to replicate real-world conditions. This research was conducted by training on several parameters such as: learning rate, dropout rate, kernel, weight decay size, optimization, epochs, and batch size. This research resulted in a CNN method with the best parameter values produced {weight_decay': 1e-07, 'optimizer': 'adamw', 'learning_rate': 0.001, 'kernel_initializer': 'he_normal', 'dropout_rate': 0.5, 'epochs': 100, 'batch_size': 48}, which has score value of 0.7448840381991815. The model demonstrated a general accuracy level of 75.85% for the training data and 51.64% for the test data, indicating its ability to recognize existing patterns but difficulty in generalizing new data. However, the ROC curve values indicate that the model is capable of differentiating voice data into its respective classes, with values of 0.84 for angry emotions, 0.79 for fear emotions, 0.83 for happy emotions, 0.80 for sad emotions, and 0.9 for neutral emotions.

References

Aini, Y. K., Santoso, T. B., & Dutono, T. (2021). Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia. Jurnal Komputer Terapan, 7(1), 143–152. https://doi.org/10.35143/jkt.v7i1.4623

Alluhaidan, A. S., Saidani, O., Jahangir, R., Nauman, M. A., & Neffati, O. S. (2023). Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084750

George, S. M., & Muhamed Ilyas, P. (2024). A review on speech emotion recognition: A survey, recent advances, challenges, and the influence of noise. Neurocomputing, 568, 127015. https://doi.org/10.1016/J.NEUCOM.2023.127015

Juslin, P., & Scherer, K. (2008). Speech emotion analysis. Scholarpedia, 3(10). https://doi.org/10.4249/scholarpedia.4240

Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8). https://doi.org/10.1007/s10462-020-09825-6

Rahmadani, S., Rahayu, C. S., Salim, A., & Cahyo, K. N. (2022). DETEKSI EMOSI BERDASARKAN WICARA MENGGUNAKAN DEEP LEARNING MODEL. Jurnal Informatika Teknologi Dan Sains (Jinteks), 4(3), 220–224. https://doi.org/10.51401/JINTEKS.V4I3.1952

Tanudjaja, F. J., Puspaningrum, E. Y., & Via, Y. V. (2023). Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network : Klasifikasi Jenis Emosi Melalui Ucapan. Teknologi: Jurnal Ilmiah Sistem Informasi, 13(2), 1–11. https://doi.org/10.26594/TEKNOLOGI.V13I2.3740

Downloads

Published

2025-01-22