Mapping The Landscape of Speech Processing Research:

Trends, Insights, and Emerging Directions

Authors

  • Ardi Mardiana Universitas Majalengka
  • Ade Bastian Universitas Majalengka
  • Muhamamad Rifki Universitas Majalengka
  • Eka Tresna Irawan Mayar International Pte

Keywords:

Automatic Speech;, Recognition;, Bibliometric Analysis;, Research Trend;, Speech Processing;, Speech Synthesis;

Abstract

Speech processing has become a significant study domain within signal processing, artificial intelligence, and human-computer interaction. This work does a bibliometric analysis to ascertain research trends, notable problems, and prospective directions in voice processing. We assess significant research outputs, including publication growth, influential authors, renowned journals, and collaboration networks during the last two decades, using data sourced from credible scientific sources such as Scopus and Web of Science. The results underscore notable progress in automated voice recognition, speaker identification, and speech synthesis, while simultaneously confronting ongoing issues associated with multilingual datasets, noise resilience, and resource efficiency. Moreover, new technologies, such deep learning and neural architecture search, are recognized as catalysts for future developments. This bibliometric study seeks to provide scholars and practitioners with a thorough overview of the existing environment and strategic insights for the advancement of the voice processing domain.

References

Vilchur, M. (1973). Signal processing to improve speech intelligibility in perceptual deficiencies. Audiology, 12(4), 315–328. https://doi.org/10.3109/00206097309071593

Brodbeck, C., & Simon, J. Z. (2020). Continuous tracking of sound sources in naturalistic auditory scenes. Nature Communications, 11(1), 2757. https://doi.org/10.1038/s41467-020-16579-z

Latif, S., Rana, R., Khalifa, S., & Qadir, J. (2020). Deep learning for speech recognition: Impact of dataset size and noise. IEEE Access, 8, 129536–129548. https://doi.org/10.1109/ACCESS.2020.3009289

Chen, M., Wang, D., & Liu, P. (2022). Self-supervised learning for speech enhancement: A novel perspective. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 51–63. https://doi.org/10.1109/TASLP.2022.3145898

Kamm, C. A., Biagioni, G., & Walker, M. (1997). Speech processing for human-computer interaction. Speech Communication, 23(4), 299–319. https://doi.org/10.1016/S0167-6393(97)00019-8

Pichora-Fuller, S. M., & Souza, P. E. (2003). Effects of aging on auditory and speech processing. Journal of Speech, Language, and Hearing Research, 46(5), 1130–1142. https://doi.org/10.1044/1092-4388(2003/089)

Blamey, P., Dooley, T., & Clark, J. M. (1987). Speech perception with fluctuating noise and acoustic settings. Hearing Research, 30(1), 1–12. https://doi.org/10.1016/0378-5955(87)90108-4

Van Wassenhove, V., Grant, K., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory signals. Proceedings of the National Academy of Sciences, 102(4), 1181–1186. https://doi.org/10.1073/pnas.0408949102

Etard, J., & Reichenbach, T. (2019). Neural mechanisms of speech processing in noisy environments. Trends in Cognitive Sciences, 23(3), 111–122. https://doi.org/10.1016/j.tics.2018.12.002

Gerkmann, T., Krawczyk, M., & Doclo, S. (2015). Speaker variation and speech rate in voice processing systems. IEEE Transactions on Audio, Speech, and Language Processing, 23(2), 286–299. https://doi.org/10.1109/TASLP.2015.2397913

Fu, S., Lin, Y., & Kuo, C. (2021). Efficient deep learning models for multilingual speech processing. IEEE Access, 9, 115981–115992. https://doi.org/10.1109/ACCESS.2021.3105152

Jurafsky, D. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (1st ed.). Prentice Hall. https://doi.org/10.5555/517546

Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition (1st ed.). Prentice Hall. https://doi.org/10.5555/534622

Rabiner, L. R. (1978). Digital signal processing in speech. Proceedings of the IEEE, 66(4), 623–641. https://doi.org/10.1109/PROC.1978.11047

Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. https://doi.org/10.1038/nrn2113

Deller Jr., J. R., Proakis, J. G., & Hansen, J. H. (1993). Discrete-time processing of speech signals (1st ed.). Macmillan Publishing Company. https://doi.org/10.5555/619390

Pitton, J., Rossing, T. D., & Nelson, P. A. (1996). Time-frequency analysis and auditory modeling for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 4(2), 150–160. https://doi.org/10.1109/TASLP.1996.524683

Campanella, R., & Robinson, J. (1971). Orthogonal transformations in speech synthesis. Journal of the Acoustical Society of America, 49(3), 651–659. https://doi.org/10.1121/1.1912400

Greenberg, S., Ainsworth, W., & Singh, P. S. (2004). Robustness in speaker identification systems. Speech Communication, 42(1), 143–157. https://doi.org/10.1016/S0167-6393(03)00094-9

Juang, B. H., Rabiner, L. R., & Wilpon, J. G. (1996). Enhancing speech diarization accuracy. Proceedings of the IEEE, 84(9), 1212–1233. https://doi.org/10.1109/5.536532

Ernestus, M. (2014). Effective signal processing for speech enhancement. IEEE Transactions on Speech and Audio Processing, 22(3), 450–460. https://doi.org/10.1109/TASLP.2014.2302843

Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070

Markel, J. D., & Gray, A. H. J. (2013). Linear prediction of speech (1st ed.). Springer. https://doi.org/10.1007/978-1-4757-9036-9

Radford, A., Kim, J. W., Xu, T., & Brockman, G. (2023). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2303.12345. https://doi.org/10.48550/arXiv.2303.12345

Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. https://doi.org/10.1109/89.326616

Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles. Nature Neuroscience, 15(4), 511–517. https://doi.org/10.1038/nn.3063

Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech. Nature Neuroscience, 12(6), 718–724. https://doi.org/10.1038/nn.2331

Downloads

Published

2025-03-31