Improving Spoken Language Identification in Noisy Environment Based on Feature Reduction Using PCA

ElBedwehy, Mona Nagy; Mayyalou, Kholoud; Behery, G. M.; Elbarougy, Reda

doi:10.21608/jocc.2025.446638

Improving Spoken Language Identification in Noisy Environment Based on Feature Reduction Using PCA

Document Type : Original Article

Authors

¹ Department of Computer Science, Faculty of Computer and Artificial Intelligence, Damietta University, Egypt

² Department of Information Technology, Faculty of Computer and Artificial Intelligence, Damietta University, Egypt

10.21608/jocc.2025.446638

Abstract

Automatic Spoken Language Identification (ASLID) is essential for effective multilingual communication, especially in real-world environments characterized by noise and acoustic variability where noise significantly impacts performance. This research introduces a robust ASLID framework that highlights the significance of feature reduction via principal components analysis (PCA) integrated with linear discriminant analysis (LDA) to enhance classification performance in noisy environments. The system utilizes OpenSMILE to extract extensive audio features, capturing diverse speech characteristics necessary for accurate language discrimination. To address the high dimensionality and redundancy inherent in the feature set, PCA is employed to reduce the feature space, preserving the most significant variance and enhancing computational efficiency. Following PCA, LDA is applied to maximize class separability, further refining the feature space for effective language classification. The proposed approach is evaluated on a benchmark dataset under various noise levels and test set proportions. Extensive experiments conducted on the IIIT-H Indic speech dataset demonstrate that the proposed PCA-LDA approach outperforms traditional methods, achieving an accuracy of up to 99.92% in noisy conditions, even with reduced feature dimensions. Experimental results demonstrate that integrating PCA with LDA significantly improves accuracy and robustness, outperforming conventional feature selection and classification techniques. The findings affirm that the combined PCA-LDA strategy effectively enhances the resilience of ASLID systems in challenging acoustic environments, making it a promising solution for practical multilingual speech processing applications.

Keywords

References

References

1.	Bagi, R., Yadav, J., Rao, K. (2015). Improved recognition rate of language identification system in noisy environment. In Eighth International Conference on Contemporary Computing (pp.214-219), Noida, India.
2.	Yu-bin, S., Jing, L., Hua, L., Yi-min, L. (2021). Language Identification in Real Noisy Environments. Journal of Beijing University of Posts and Telecommunications 44(6), 134.
3.	Kilimci, H., Kilinc, H., Kilimci, Z. (2025). Automatic Language Identification from Speech using Transformer-Based Models. In 7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA) (pp.1-7).
4.	Barnard, E., Cole, R. (1994). Reviewing automatic language identiﬁcation. IEEE Signal Processing Magazine 11(4), 33–41.
5.	O'Shaughnessy, D. (2025). Spoken language identification: An overview of past and present research trends. Speech Communication, 167.
6.	Rai, M., Fahad, M., Yadav, J., Rao, K. (2016). Language identification using PLDA based on i-vector in noisy environment. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp.1014-1020).
7.	Sáez, J., Luengo, J., Herrera, F. (2016). Evaluating the classifier behavior with noisy data considering performance and robustness: The equalized loss of accuracy measure. Neurocomputing, 176, 26-35.
8.	H, M., Gupta, S., Dinesh, D., Rajan, P. (2021). Noise-Robust Spoken Language Identification Using Language Relevance Factor Based Embedding. In IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China, IEEE.
9.	Makhoul, J. (2005). Linear prediction: A tutorial review. In Proceedings of the IEEE (pp.561 - 580), vol. 63.
10.	Mermelstein, P. (1976). Distance measures for speech recognition, psychological and instrumental. Pattern Recognition and Artificial Intelligence, 374-388.
11.	Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87(4), 1738–1752.
12.	Eyben, F., Wöllmer, M., Schuller, B. (2010). OPENSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the 9th ACM International Conference on Multimedia (pp.1459-1462).
13.	Singh, G., Sharma, S., Kumar, V., Kaur, M., MohammedBaz, Masud, M. (2021). Spoken Language Identification Using Deep Learning. Computational Intelligence and Neuroscience, 1–12.
14.	Fathoni, A., Hidayat, R., Bejo, A. (2022). Optimization of Feature Extraction in Indonesian Speech Recognition Using PCA and SVM Classification. In 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia.
15.	Ramoji, S., Ganapathy, S. (2018). Supervised I-vector Modeling-Theory and Applications. INTERSPEECH, 1091-1095.
16.	Thimmaraja, Y., Nagaraja, B., Jayanna, H. (2021). Speech enhancement and encoding by combining SS-VAD and LPC. International Journal of Speech Technology, 24, 165–172.
17.	Nassif, A., Shahin, I., Hamsa, S., Nemmour, N., Hirose, K. (2021). CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Applied Soft Computing, 103.
18.	Kantamaneni, S., Charles, A., Babu, T. (2023). Speech enhancement with noise estimation and filtration using deep learning models. Theoretical Computer Science, 941, 14-28.
19.	Biswas, M., Rahaman, S., Ahmadian, A., Subari, K., Singh, P. (2023). Automatic spoken language identification using MFCC based time series features. Multimedia Tools and Applications, 82, 9565–9595.
20.	Salamon, J., Bello, J. P. (2017). Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Processing Letters, 24(3), 279-283.
21.	Luo, Y., Mesgarani, N. (2019). Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(8), 1256-1266.
22.	audEERING. Available at: https://www.audeering.com/research/opensmile/.
23.	Jolliffe, I. (2002). Principal Component Analysis 2nd edn. Springer , New York.
24.	Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.
25.	Xanthopoulos, P., Pardalos, P. M., Trafalis, T. B. (2013). Linear Discriminant Analysis. In Robust Data Mining. Springer, New York.
26.	Olson, D., Delen, D. (2008). Advanced Data Mining Techniques 1st edn. Springer.
27.	Prahallad, K., Kumar, E., Keri, V., Rajendran, S., Black, A. IIIT-H Indic Speech Databases, IIIT Hyderabad, India. (Accessed 2024) Available at: http://festvox.org/databases/iiit_voices/.
28.	Gupta, M., Bharti, S., Agarwal, S. (2017). Implicit language identification system based on random forest and support vector machine for speech. In 4th International Conference on Power, Control & Embedded Systems (ICPCES), Allahabad, India.
29.	Athira, N., Poorna, S. (2019). Deep learning based language identification system from speech. In International Conference on Intelligent Computing and Control Systems (ICCS) (pp.1094-1097), Madurai, India.
30.	Mukherjee, H., Das, S., Dhar, A., Obaidullah, S., Santosh, K., Phadikar, S., Roy, K. (2020). An ensemble learning-based language identification system. In Computational Advancement in Communication Circuits and Systems: Proceedings of ICCACCS, 2018.
31.	Paul, B., Phadikar, S., Bera, S. (2021). Indian regional spoken language identification using deep learning approach. In Proceedings of the Sixth International Conference on Mathematics and Computing: ICMC 2020 (pp.263-274), Singapore.
32.	AMBILI, A., ROY, R. (2023). The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages. IEEE Access, 11, 102391 - 102407.