Image Captioning Generator Text-to-Speech

Sharma Tripti; Neetu Anand; Kumar  Gaurav; Rohit Kapoor

doi:10.47164/ijngc.v13i3.669

Published Oct 31, 2022

https://doi.org/10.47164/ijngc.v13i3.669

Download

PDF

Statistic

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Volume 13, Special Issue 3, October 2022

Sharma Tripti

Maharaja Surajmal Institute of Technology

Neetu Anand

Maharaja Surajmal Institute

Kumar Gaurav

Maharaja Surajmal Institute

Rohit Kapoor

Maharaja Surajmal Institute of Technology

Abstract

A model is created for blind people that can guide and support them while traveling on the highways just with the help of a smartphone application. This can be accomplished by first converting the scene in front of the user into text and then converting text into voice output. Then a method for the generation of image legends based on deep neural networks. With an image as an entry, the method can display an English sentence describing the contents of the image. The user first provides a voice command, then a quick snapshot is captured by the camera or webcam. This image is then fed as input to the image caption generator template that generates a caption for the image. Next, this caption text is converted to speech, which gives rise to a voice message on the description of the image.

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Tripti, S., Anand, N., Gaurav, K. ., & Kapoor, R. (2022). Image Captioning Generator Text-to-Speech. International Journal of Next-Generation Computing, 13(3). https://doi.org/10.47164/ijngc.v13i3.669

References

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee. DOI: https://doi.org/10.1109/CVPR.2009.5206848
Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128-3137). DOI: https://doi.org/10.1109/CVPR.2015.7298932
Yang, Z., Zhang, Y. J., & Huang, Y. (2017, September). Image captioning with object detection and localization. In International Conference on Image and Graphics (pp. 109-118). Springer, DOI: https://doi.org/10.1007/978-3-319-71589-6_10
Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5561-5570). DOI: https://doi.org/10.1109/CVPR.2018.00583
Pan, J. Y., Yang, H. J., Duygulu, P., & Faloutsos, C. (2004, June). Automatic image captioning. In 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763) [](Vol. 3, pp. 1987-1990). IEEE.
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164). DOI: https://doi.org/10.1109/CVPR.2015.7298935
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ... & Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057). PMLR.
Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853-899. DOI: https://doi.org/10.1613/jair.3994
Cieri, C., Miller, D., & Walker, K. (2004, May). The Fisher corpus: A resource for the next generations of speech-to-text. In LREC (Vol. 4, pp. 69-71).

About Journal

Image Captioning Generator Text-to-Speech

Downloads

Metrics

Abstract

References

Most read articles by the same author(s)

About Journal

##plugins.themes.academic_pro.article.sidebar##

Downloads

Metrics

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##

References

Most read articles by the same author(s)