Object Detection using Speech Recognition


Chetana Thaokar
Gayatri Ladsawangikar
Tanaya Wadibhasme
Sandeep Sureka


Nearly all practical applications, including autonomous navigation, visual systems, face recognition, and more, rely on object detection. In this paper, object detection and speech recognition are combined to help visually impaired people who want to use voice commands to find a certain object. People who are blind or visually challenged can move more independently if they are aware of their surroundings. With the use of the OpenCV libraries, a model has been implemented, and good results have been obtained. In this paper, a thorough review of object detection employing region-based conventional neural network (CNN)- based learning systems for practical applications has been conducted. This study examines the various object identification processes utilizing YOLOV4 object detection techniques and talks through detection, including a speech recognition system that was created by transcribing spoken language into text.


How to Cite
Thaokar, C., Ladsawangikar, G., Wadibhasme, T. ., & Sureka, S. (2022). Object Detection using Speech Recognition. International Journal of Next-Generation Computing, 13(5). https://doi.org/10.47164/ijngc.v13i5.974


  1. Sandeep Kumar, Aman Balyan, Manvi Chawla, 2017, Object Detection and Recognition in Images, IJEDR, Volume 5, Issue 4
  2. Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang, 2020, Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection, IEEE Xplore DOI: https://doi.org/10.1109/CVPR42600.2020.01309
  3. Saliha Benkerzaz, Youssef Elmir, Abdeslam Dennai, 2019, “A study on automatic speech recognition”, Journal of Information Technology Review, vol 10
  4. Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, Apr 2020, “YOLOv4: OptimalSpeedand Accuracy of Object Detection ”, Journal of Computer and Information Science
  5. Dinesh Kumar Dansena et al, Yogesh rathod, 2015, “A Survey Paper on Automatic Speech Recognition by Machine”, International Journal of Computer Science and Information Technologies, Vol. 3
  6. P.Devaki, S.Shivavarsha, G.Bala Kowsalya, M. Manjupavithraa, E.A. Vima, October 2019 Real-Time Object Detection using Deep Learning and Open CV,International Journal of InnovativeTechnology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-12S DOI: https://doi.org/10.35940/ijitee.L1103.10812S19
  7. Shuangjiang Du, Baofu Zhang, Pin Zhang, Peng Xiang,Hong Xue, 2021, “FA-YOLO: An Improved YOLO Model for Infrared Occlusion Object Detection under Confusing Background”, International Conference on Intelligent Computation Technology and Automation DOI: https://doi.org/10.1155/2021/1896029
  8. G. Flitton, T. P. Breckon, and N. Megherbi, 2013“A comparison of 3D interest point descriptors with application to airport baggage object detection in complex CT imagery,” Pattern Recognition., vol. 46, no. 9, pp. 2420–2436. DOI: https://doi.org/10.1016/j.patcog.2013.02.008
  9. Kanta Igarashi, Ian Wilson, 2020, “Improving Japanese English pronunciation with speech recognition and feedback system”, SHS Web of Conferences DOI: https://doi.org/10.1051/shsconf/20207702003
  10. Veton Këpuska ,Gamal Bohouta, March 2017, “Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx)”, International Journal of Engineering Research and Application www.ijera.com ISSN : 2248-9622, Vol. 7, Issue 3, (Part -2)
  11. Vyacheslav Lyashenko, Farah Laariedh , Svitlana Sotnik, M. Ayaz Ahmad, May 2021, “Recognition of Voice Commands Based on Neural Network”, TEM Journal. Volume 10, Issue 2 DOI: https://doi.org/10.18421/TEM102-13
  12. Geeta Nijhawan, M.K Soni, "Real Time Speaker Recognition System for Hindi Words", IJIEEB, vol.6,no.2, pp.35-40, 2014. DOI: 10.5815/ijieeb.2014.02.04 DOI: https://doi.org/10.5815/ijieeb.2014.02.04
  13. A. S. Murugan, K. S. Devi, A. Sivaranjani, and P. Srinivasan, 2018, “A study on various methods used for video summarization and moving object detection for video surveillance applications,” Multimed. Tools Appl., vol. 77, no.18, pp. 23273–2329. DOI: https://doi.org/10.1007/s11042-018-5671-8
  14. Suma Swamy1 and K.V Ramakrishnan, Aug 2013, “An Efficient Speech Recognition System”, An International Journal (CSEIJ), Vol. 3, No. 4 DOI: https://doi.org/10.5121/cseij.2013.3403
  15. Mingxing Tan Ruoming Pang Quoc V. Le Google Research, Brain Team, 2020, EfficientDet: Scalable and Efficient Object Detection, IEEE Xplore