Mapping The Landscape of Speech Processing Research:
Trends, Insights, and Emerging Directions
Keywords:
Automatic Speech;, Recognition;, Bibliometric Analysis;, Research Trend;, Speech Processing;, Speech Synthesis;Abstract
Speech processing has become a significant study domain within signal processing, artificial intelligence, and human-computer interaction. This work does a bibliometric analysis to ascertain research trends, notable problems, and prospective directions in voice processing. We assess significant research outputs, including publication growth, influential authors, renowned journals, and collaboration networks during the last two decades, using data sourced from credible scientific sources such as Scopus and Web of Science. The results underscore notable progress in automated voice recognition, speaker identification, and speech synthesis, while simultaneously confronting ongoing issues associated with multilingual datasets, noise resilience, and resource efficiency. Moreover, new technologies, such deep learning and neural architecture search, are recognized as catalysts for future developments. This bibliometric study seeks to provide scholars and practitioners with a thorough overview of the existing environment and strategic insights for the advancement of the voice processing domain.
References
Vilchur, M. (1973). Signal processing to improve speech intelligibility in perceptual deficiencies. Audiology, 12(4), 315–328. https://doi.org/10.3109/00206097309071593
Brodbeck, C., & Simon, J. Z. (2020). Continuous tracking of sound sources in naturalistic auditory scenes. Nature Communications, 11(1), 2757. https://doi.org/10.1038/s41467-020-16579-z
Latif, S., Rana, R., Khalifa, S., & Qadir, J. (2020). Deep learning for speech recognition: Impact of dataset size and noise. IEEE Access, 8, 129536–129548. https://doi.org/10.1109/ACCESS.2020.3009289
Chen, M., Wang, D., & Liu, P. (2022). Self-supervised learning for speech enhancement: A novel perspective. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 51–63. https://doi.org/10.1109/TASLP.2022.3145898
Kamm, C. A., Biagioni, G., & Walker, M. (1997). Speech processing for human-computer interaction. Speech Communication, 23(4), 299–319. https://doi.org/10.1016/S0167-6393(97)00019-8
Pichora-Fuller, S. M., & Souza, P. E. (2003). Effects of aging on auditory and speech processing. Journal of Speech, Language, and Hearing Research, 46(5), 1130–1142. https://doi.org/10.1044/1092-4388(2003/089)
Blamey, P., Dooley, T., & Clark, J. M. (1987). Speech perception with fluctuating noise and acoustic settings. Hearing Research, 30(1), 1–12. https://doi.org/10.1016/0378-5955(87)90108-4
Van Wassenhove, V., Grant, K., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory signals. Proceedings of the National Academy of Sciences, 102(4), 1181–1186. https://doi.org/10.1073/pnas.0408949102
Etard, J., & Reichenbach, T. (2019). Neural mechanisms of speech processing in noisy environments. Trends in Cognitive Sciences, 23(3), 111–122. https://doi.org/10.1016/j.tics.2018.12.002
Gerkmann, T., Krawczyk, M., & Doclo, S. (2015). Speaker variation and speech rate in voice processing systems. IEEE Transactions on Audio, Speech, and Language Processing, 23(2), 286–299. https://doi.org/10.1109/TASLP.2015.2397913
Fu, S., Lin, Y., & Kuo, C. (2021). Efficient deep learning models for multilingual speech processing. IEEE Access, 9, 115981–115992. https://doi.org/10.1109/ACCESS.2021.3105152
Jurafsky, D. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (1st ed.). Prentice Hall. https://doi.org/10.5555/517546
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition (1st ed.). Prentice Hall. https://doi.org/10.5555/534622
Rabiner, L. R. (1978). Digital signal processing in speech. Proceedings of the IEEE, 66(4), 623–641. https://doi.org/10.1109/PROC.1978.11047
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. https://doi.org/10.1038/nrn2113
Deller Jr., J. R., Proakis, J. G., & Hansen, J. H. (1993). Discrete-time processing of speech signals (1st ed.). Macmillan Publishing Company. https://doi.org/10.5555/619390
Pitton, J., Rossing, T. D., & Nelson, P. A. (1996). Time-frequency analysis and auditory modeling for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 4(2), 150–160. https://doi.org/10.1109/TASLP.1996.524683
Campanella, R., & Robinson, J. (1971). Orthogonal transformations in speech synthesis. Journal of the Acoustical Society of America, 49(3), 651–659. https://doi.org/10.1121/1.1912400
Greenberg, S., Ainsworth, W., & Singh, P. S. (2004). Robustness in speaker identification systems. Speech Communication, 42(1), 143–157. https://doi.org/10.1016/S0167-6393(03)00094-9
Juang, B. H., Rabiner, L. R., & Wilpon, J. G. (1996). Enhancing speech diarization accuracy. Proceedings of the IEEE, 84(9), 1212–1233. https://doi.org/10.1109/5.536532
Ernestus, M. (2014). Effective signal processing for speech enhancement. IEEE Transactions on Speech and Audio Processing, 22(3), 450–460. https://doi.org/10.1109/TASLP.2014.2302843
Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
Markel, J. D., & Gray, A. H. J. (2013). Linear prediction of speech (1st ed.). Springer. https://doi.org/10.1007/978-1-4757-9036-9
Radford, A., Kim, J. W., Xu, T., & Brockman, G. (2023). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2303.12345. https://doi.org/10.48550/arXiv.2303.12345
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. https://doi.org/10.1109/89.326616
Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles. Nature Neuroscience, 15(4), 511–517. https://doi.org/10.1038/nn.3063
Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech. Nature Neuroscience, 12(6), 718–724. https://doi.org/10.1038/nn.2331
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ardi Mardiana, Ade Bastian, Muhamamad Rifki, Eka Tresna Irawan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms
