Sistem Penerjemahan Ucapan Bahasa Sunda Berbasis Web dengan Augmentasi Visual Menggunakan Convolutional Neural Network
Keywords:
Sundanese speech recognition, audio-visual speech recognition, Whisper fine-tuning, convolutional neural network, low-resource language, attention mechanism, lip-reading, web-based translation system, MediaPipeAbstract
This research develops a web-based Sundanese speech translation system incorporating visual enhancement through Convolutional Neural Network (CNN). The primary challenge is insufficient accuracy in audio-only Automatic Speech Recognition (ASR) for low-resource languages under noisy conditions. The solution integrates fine-tuned Whisper Medium for transcription, CNN-based lip-reading, and attention-weighted audio-visual fusion. Training used OpenSLR36 Sundanese corpus with ~35,000 samples from 175,324 available instances (subset due to memory constraints). Optimization was executed on RunPod using NVIDIA RTX 4090 GPU (24GB VRAM) for 5,000 iterations (~11 hours). Results show the optimized model achieves Word Error Rate (WER) of 2.45% at optimal checkpoint (iteration 3500), improving 7.37 percentage points from baseline (9.82% at iteration 500). This performance approaches state-of-the-art by Raharjo & Zahra (2025) reporting 2.03% WER using Whisper Small. The visual module comprises three-layer CNN producing 512-dimensional features with MediaPipe facial detection. Black-box testing validates functional compliance, while responsive interface ensures cross-device compatibility. This work advances Sundanese preservation through accessible translation with competitive accuracy.
References
Aini, N., Asri, L., Adam, R. I., & Dermawan, B. A. (2023). Speech recognition untuk klasifikasi pengucapan nama hewan dalam bahasa Sunda menggunakan metode Long Short-Term Memory. JATI (Jurnal Mahasiswa Teknik Informatika), 7. https://doi.org/10.36040/jati.v7i2.6744
Aini, N., Asri, L., Adam, R. I., & Dermawan, B. A., "Speech recognition untuk klasifikasi pengucapan nama hewan dalam bahasa Sunda menggunakan metode Long Short-Term Memory," JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, 2023.
Aini, Y. K., Santoso, T. B., & Dutono, D. T., "Pemodelan CNN untuk deteksi emosi berbasis speech bahasa Indonesia," Jurnal Komputer Terapan, vol. 7, pp. 143–152, 2021.
Arya, K., Wirya Kesuma, B., Anggara Wijaya, Y., & Putra, J. E., "Implementasi Next.js, TypeScript, dan Tailwind CSS untuk pengembangan aplikasi frontend sistem inventory perusahaan APAR (Studi kasus: CV Indoka Surya Jaya)," JIKOM: Jurnal Informatika dan Komputer, vol. 14, pp. 95–108, 2024.
Friadi, J., Yani, D. P., Zaid, M., & Sikumbang, A., "Perancangan pemodelan Unified Modeling Language sistem antrian online kunjungan pasien rawat jalan pada puskesmas," Jurnal Ilmu Siber dan Teknologi Digital, vol. 1, pp. 125–133, 2023.
Gunawan, R., & Rahmatulloh, A., "JSON Web Token (JWT) untuk authentication pada interoperabilitas arsitektur berbasis RESTful web service," Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 5, pp. 74–80, 2019.
Iqbal, M., & Andharsaputri, R. L., "Implementasi UML untuk perancangan sistem informasi pengadaan barang pada RSUD Kota Bogor," Jurnal Teknik Informatika (JEKIN), vol. 4, 2024. https://doi.org/10.58794/jekin.v4i2.727
Ivanko, D., Ryumin, D., & Karpov, A., "A review of recent advances on deep learning methods for audio-visual speech recognition," Mathematics, vol. 11, 2023. https://doi.org/10.3390/math11122665
Jaelani, A. J., Hikmat, A., & Safi'i, I., "Preservation of the Sundanese Wewengkon Kuningan language through Android-based educational games," Journal of Ecohumanism, vol. 3, 2025. https://doi.org/10.62754/joe.v3i8.5692
Mega Santoni, M., Chamidah, N., Prasvita, D. S., Irmanda, H. N., & Prayoga, R. A., "Penerapan convolutional neural networks untuk mesin penerjemah bahasa daerah Minangkabau berbasis gambar," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, pp. 1153–1160, 2021.
Novitasari, S., Tjandra, A., Sakti, S., & Nakamura, S., "Cross-lingual machine speech chain untuk Javanese, Sundanese, Balinese, dan Bataks speech recognition dan synthesis," Proceedings of the European Language Resources Association, 2020.
Nurwicaksono, M. A., Lisa, I. N., Tiara, A. R., & Sidik, R., "Optimasi sistem informasi konsultasi hukum melalui pendekatan pengujian kombinasi white-box dan black-box," Jurnal Manajemen Informatika (JAMIKA), vol. 14, pp. 1–15, 2023.
Pawar, D. R., & Yannawar, P., "Recent advances in audio-visual speech recognition: Deep learning perspective," Proceedings of ACVAIT 2022, pp. 409–421, 2024.
Pratiwi, Y., & Widianti, L. W., "Implementasi white-box testing dengan teknik basis path pada pengujian halaman pencarian program promo," Jurnal Kecerdasan Buatan dan Teknologi Informasi, vol. 4, pp. 173–180, 2025.
Pulungan, S. M., Febrianti, R., Lestari, T., Gurning, N., & Fitriana, N., "Analisis teknik entity relationship diagram dalam perancangan basis data," Jurnal Ekonomi Manajemen dan Bisnis, vol. 1, pp. 143–147, 2022.






