Dynamic Hand Gesture Recognition for Indian Sign Language using Integrated CNN-LSTM Architecture
##plugins.themes.academic_pro.article.main##
Abstract
Human Centered Computing is an emerging research field that aims to understand human behavior. Dynamic hand gesture recognition is one of the most recent, challenging and appealing application in this field. We have proposed one vision based system to recognize dynamic hand gestures for Indian Sign Language (ISL) in this paper. The system is built by using a unified architecture formed by combining Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). In order to hit the shortage of a huge labeled hand gesture dataset, we have created two different CNN by retraining a well known image classification networks GoogLeNet and VGG16 using transfer learning. Frames of gesture videos are transformed into features vectors using these CNNs. As these videos are prearranged series of image frames, LSTM model have been used to join with the fully-connected layer of CNN. We have evaluated the system on three different datasets consisting of color videos with 11, 64 and 8 classes. During experiments it is found that the proposed CNN-LSTM architecture using GoogLeNet is fast and efficient having capability to achieve very high recognition rates of 93.18%, 97.50%, and 96.65% on the three datasets respectively.
##plugins.themes.academic_pro.article.details##
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
- Adithya, V. and Rajesh, R. 2020. Hand gestures for emergency situations: A video dataset based on words from indian sign language. Data in Brief Vol.31. DOI: https://doi.org/10.1016/j.dib.2020.106016
- Chen, G. and Ge, K. 2020. A fusion recognition method based on multifeature hidden markov model for dynamic hand gesture. Computational Intelligence and Neuroscience No.8871605. DOI: https://doi.org/10.1155/2020/8871605
- Dadashzadeh, A., Targhi, A., Tahmasbi, M., and Mirmehdi, M. 2019. Hgr-net: a fusion network for hand gesture segmentation and recognition. IET Computer Vision Vol.13, No.8. DOI: https://doi.org/10.1049/iet-cvi.2018.5796
- Gangrade, J. and Bharti, J. 2020. Vision-based hand gesture recognition for indian sign language using convolution neural network. IETE Journal of Research Vol.31, pp.1–10.
- Gers, F. A., Schmidhuber, J., and Cummins, F. 2000. Learning to forget: Continual pre- diction with lstm. Neural Computation Vol.12, pp.2451–2471. DOI: https://doi.org/10.1162/089976600300015015
- Gupta, B., Shukla, P., and Mittal, A. 2016. K-nearest correlated neighbor classification for indian sign language gesture recognition using feature fusion. In International Conference on Computer Communication and Informatics. pp.1–5. DOI: https://doi.org/10.1109/ICCCI.2016.7479951
- Hakim, N., Shih, T., Arachchi, S., Aditya, W., Chen, Y., and Lin, C. 2019. Dy- namic hand gesture recognition using 3dcnn and lstm with fsm context-aware model. Sen- sors Vol.19, No.24. DOI: https://doi.org/10.3390/s19245429
- Hochreiter, S. and Schmidhuber, J. 1997. Long short-term memory. Neural Computa- tion Vol.9, pp.1735–1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
- Huu, P., Minh, Q., and The, H. 2020. An ann-based gesture recognition algorithm for smart- home applications. Ksii Transactions on Internet and Information Systems Vol.14, No.5. DOI: https://doi.org/10.3837/tiis.2020.05.006
- Joshi, G., Singh, S., and Vig, R. 2020. Taguchi-topsis based hog parameter selection for complex background sign language recognition. Journal of Visual Communication and Image Representation Vol.71, No.102834. DOI: https://doi.org/10.1016/j.jvcir.2020.102834
- Koller, O., Zargaran, O., Ney, H., and Bowden, R. 2016. Deep sign: Hybrid cnnhmm for continuous sign language recognition. In British Machine Vision Conference. 136.1–136.12. Kopuklu, O., Gunduz, A., Kose, N., and Rigoll, G. 2019. Real-time hand gesture detection and classification using convolutional neural networks. In IEEE International Conference
- on Automatic Face and Gesture Recognition. 1–8.
- Kumar, P., Gauba, H., Roy, P., and Dogra, D. 2016. Coupled hmm-based multisensor data fusion for sign language recognition. Pattern Recognition Letters Vol.86, pp.1–8. DOI: https://doi.org/10.1016/j.patrec.2016.12.004
- Liao, Y., Xiong, P., Min, W., Min, W., and Lu, J. 2019. Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access Vol.7, pp.38044– 38054. DOI: https://doi.org/10.1109/ACCESS.2019.2904749
- Lipton, Z. 2015. A critical review of recurrent neural networks for sequence learning.
- arXiv:1506.00019 .
- Mazhar, O., Ramdani, S., and Cherubini, A. 2021. A deep learning framework for re- cognizing both static and dynamic gestures. Sensors Vol.21, No.2227. DOI: https://doi.org/10.3390/s21062227
- Molchanov, P., Gupta, S., Kim, K., and Kautz, J. 2015. Hand gesture recognition with 3d convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–7. DOI: https://doi.org/10.1109/CVPRW.2015.7301342
- Naidu, C. and Ghotkar, A. 2016. Hand gesture recognition using leap motion controller.
- International Journal of Science and Research Vol.5, No.10.
- Ordonez, F. and Roggen, D. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors Vol.16, No.1. DOI: https://doi.org/10.3390/s16010115
- Ravi, S., Suman, M., Kishore, P., Kumar, E., Kumar, M., and Kumar, D. 2019. Multi modal spatio temporal co-trained cnns with single modal testing on rgb–d based sign lan- guage gesture recognition. Journal of Computer Languages Vol.52, pp.88–102. DOI: https://doi.org/10.1016/j.cola.2019.04.002
- Sainath, T., Vinyals, O., Senior, A., and Sak, H. 2015. Convolutional, long shortterm memory, fully connected deep neural networks. In International Conference on Acoustics, Speech and Signal Processing. pp. 4580–4584. DOI: https://doi.org/10.1109/ICASSP.2015.7178838
- Simonyan, K. and Zisserman, A. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 .
- Sridhar, A., Ganesan, R., Kumar, P., and Khapra, M. 2020. Include: A large scale dataset for indian sign language recognition. In 28th ACM International Conference on Multimedia. DOI: https://doi.org/10.1145/3394171.3413528
- pp. 1366–1375.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., and Anguelov, D. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recogni- tion. pp. 1–9. DOI: https://doi.org/10.1109/CVPR.2015.7298594
- Uchil, A., Jha, S., and Sudha, B. 2019. Vision based deep learning approach for dynamic indian sign language recognition in healthcare. In Computational Vision and Bio-Inspired Computing. pp. 371–383. DOI: https://doi.org/10.1007/978-3-030-37218-7_43
- Yang, S. and Zhu, Q. 2017. Continuous chinese sign language recognition with cnn-lstm. In DOI: https://doi.org/10.1117/12.2281671
- International Conference on Digital Image Processing.
- Zengeler, N., Kopinski, T., and Handmann, U. 2019. Hand gesture recognition in automo- tive human-machine interaction using depth cameras. Sensors Vol.19, pp.1–27. DOI: https://doi.org/10.3390/s19010059
- Zimmermann, T., Taetz, B., and Bleser, G. 2018. Imu-to-segment assignment and orienta- tion alignment for the lower body using deep learning. Sensors Vol.18, No.1. DOI: https://doi.org/10.3390/s18010302