Performance Evaluation of Different ASR Classifiers on Mobile Device

##plugins.themes.academic_pro.article.main##

Gulbakshee J. Dharmale
Dipti D. Patil

Abstract

Automatic speech recognition is an option in contrast to composing on cell phones. Recently, it is usual and increasingly popular trend in communication. Classifier is used to classify the fragmented phonemes or words after the fragmentation of the speech signal. Several techniques are used for the classification of phoneme or word such as Neural Network, Support Vector Machine, Hidden Markov Model and Gaussian Mixture Model (GMM). This paper presents detailed study and performance analysis of above classification techniques. The performance evaluation is done to prove that GMM is better at the classification of signal data, and can be effectively used for improving the classification accuracy of the existing system. Our results show that accuracy of GMM is more than 20 % better than other three classifiers. The performance of ASR classifier is evaluated on android phones, and evaluated for normal conversations in Hindi language used in day to day human to machine communications, using high-quality recording equipment.

##plugins.themes.academic_pro.article.details##

How to Cite
Gulbakshee J. Dharmale, & Dipti D. Patil. (2021). Performance Evaluation of Different ASR Classifiers on Mobile Device. International Journal of Next-Generation Computing, 12(2), 124–133. https://doi.org/10.47164/ijngc.v12i2.204

References

  1. T. Kamm, H. Hermansky, and A. G. Andrea. (1997). Learning the Mel-scale and optimal VTN mapping. Center for Language and Speech Processing, Workshop.
  2. Jyoti B. Ramgire, Prof. Sumati M.Jagdale. (2016). A Survey on Speaker Recognition with Various Feature Extraction and Classification Techniques. International Research Journal of Engineering and Technology (IRJET). 3(4): pp. 709-712.
  3. Ms. R.D. Bodke, Prof. Dr. M. P. Satone. (2018). A Review on Speech Feature Techniques and Classification Techniques. International Journal of Trend in Scientific Research and Development. 2(4): pp. 1465-1469.
  4. Lahdesmaki. H., and Shumleuch, A. (2008). Learning the Structure of Dynamic Bayesian Networks from Time Series and Steady state Measurements. Machine Learning (ML). 71(2): PP. 185-217.
  5. Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard Arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143.
  6. Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology. https ://doi.org/10.1049/cp.2012.2528.
  7. Kumar, Y., & Singh, N. (2017). An automatic speech recognition system for spontaneous Punjabi speech corpus. International Journalof Speech Technology, 20(2), 297–303. https ://doi.org/10.1007/s1077 2-017-9408-2.
  8. Lu?i?, B., Ostrogonac, S., Vujnovi? Sedlar, N., & Se?ujski, M. (2015). Educational applications for blind and partially sighted pupils based on speech technologies for Serbian. The Scientific World Journal. 2015. https ://doi.org/10.1155/2015/83925 2.
  9. Thalengala, A., & Shama, K. (2016). Study of sub-word acoustical models for Kannada isolated word recognition system. International Journal of Speech Technology, 19(4), 817–826. https ://doi. org/10.1007/s1077 2-016-9374-0.
  10. Schmitt, A., Zaykovskiy, D., & Minker, W. (2008). Speech recognition for mobile devices. International Journal of Speech Technology, 11(2), 63–72. https ://doi.org/10.1007/s1077 2-009-9036-6.
  11. Nkosi, M., Manamela, M., & Gasela, N. (n.d.). Creating a pronunciation dictionary for automatic speech recognition -a morphological approach. Retrieved January 3, 2018 from http://www.satna c.org. za/proce eding s/2011/paper s/Netwo rk_Servi ces/176.pdf.
  12. Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech is 3 × faster than typing for english and mandarin text entry on mobile devices. Retrieved January 3, 2018 from http://arxiv .org/abs/1608.07323.
  13. Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182).
  14. Ms. Jasleen Kaur, Prof. Puneet Mittal. (2017). On Developing an Automatic Speech Recognition System for Commonly used English Words in Indian English. International Journal on Recent and Innovation Trends in Computing and Communication. 5(7): pp. 87-92.
  15. Essa. E., Tolba, A., Elmougy. S. (2008). Combined Classifier Based Arabic Speech Recognition. International Journal in Speech Recognition and Computer-Human Interaction. 4 (2): PP. 11-15.
  16. Garg. A., Rehg. V. 2011. Audio-visual Speaker Detection Using Dynamic Bayesian Network. 1(1): PP. 19-27.
  17. Kurzekar, P., Deshmukh. R., Waghmare. V., et al. (2014). A Comparative Study of Feature Extraction Techniques for Speech Recognition System. International Journal of Innovative Research in Science, Engineering and Technology. 3(12): PP. 18006-180016.
  18. Bhuvaneshwari Jolad, Dr. Rajashri Khanai. (2018). Different feature extraction techniques for automatic speech recognition: a review. International journal of engineering sciences & research technology. 3. pp. 181-188.
  19. R. Thiruvengatanadhan. (2018). Speech Recognition using SVM’, International Research Journal of Engineering and Technology (IRJET). 5(9): pp. 918-921.
  20. Sumita Nainan, Vaishali Kulkarni. (2016). A Comparison of Performance Evaluation of ASR for Noisy and Enhanced signal using GMM. International Conference on Computing, Analytics and Security Trends (CAST) College of Engineering Pune, India. IEEE. pp. 486-494.
  21. Ratnadeep R. Deshmukh, Abdulmalik Alasadi. (2018). Automatic Speech Recognition: A Review. Signal processing and Computer Vision, pp. 464-470.
  22. Tom Fawcett. (2006). An Introduction to ROC analysis. Pattern recognition letters, 27, pp.861-874.