Dynamic Hand Gesture Recognition for Indian Sign Language using Integrated CNN-LSTM Architecture


Pradip Patel
Narendra Patel


Human Centered Computing is an emerging research field that aims to understand human behavior. Dynamic hand gesture recognition is one of the most recent, challenging and appealing application in this field. We have proposed one vision based system to recognize dynamic hand gestures for Indian Sign Language (ISL) in this paper. The system is built by using a unified architecture formed by combining Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). In order to hit the shortage of a huge labeled hand gesture dataset, we have created two different CNN by retraining a well known image classification networks GoogLeNet and VGG16 using transfer learning. Frames of gesture videos are transformed into features vectors using these CNNs. As these videos are prearranged series of image frames, LSTM model have been used to join with the fully-connected layer of CNN. We have evaluated the system on three different datasets consisting of color videos with 11, 64 and 8 classes. During experiments it is found that the proposed CNN-LSTM architecture using GoogLeNet is fast and efficient having capability to achieve very high recognition rates of 93.18%, 97.50%, and 96.65% on the three datasets respectively.


How to Cite
Patel, P., & Narendra Patel. (2023). Dynamic Hand Gesture Recognition for Indian Sign Language using Integrated CNN-LSTM Architecture. International Journal of Next-Generation Computing, 14(4). https://doi.org/10.47164/ijngc.v14i4.1039


  1. Adithya, V. and Rajesh, R. 2020. Hand gestures for emergency situations: A video dataset based on words from indian sign language. Data in Brief Vol.31. DOI: https://doi.org/10.1016/j.dib.2020.106016
  2. Chen, G. and Ge, K. 2020. A fusion recognition method based on multifeature hidden markov model for dynamic hand gesture. Computational Intelligence and Neuroscience No.8871605. DOI: https://doi.org/10.1155/2020/8871605
  3. Dadashzadeh, A., Targhi, A., Tahmasbi, M., and Mirmehdi, M. 2019. Hgr-net: a fusion network for hand gesture segmentation and recognition. IET Computer Vision Vol.13, No.8. DOI: https://doi.org/10.1049/iet-cvi.2018.5796
  4. Gangrade, J. and Bharti, J. 2020. Vision-based hand gesture recognition for indian sign language using convolution neural network. IETE Journal of Research Vol.31, pp.1–10.
  5. Gers, F. A., Schmidhuber, J., and Cummins, F. 2000. Learning to forget: Continual pre- diction with lstm. Neural Computation Vol.12, pp.2451–2471. DOI: https://doi.org/10.1162/089976600300015015
  6. Gupta, B., Shukla, P., and Mittal, A. 2016. K-nearest correlated neighbor classification for indian sign language gesture recognition using feature fusion. In International Conference on Computer Communication and Informatics. pp.1–5. DOI: https://doi.org/10.1109/ICCCI.2016.7479951
  7. Hakim, N., Shih, T., Arachchi, S., Aditya, W., Chen, Y., and Lin, C. 2019. Dy- namic hand gesture recognition using 3dcnn and lstm with fsm context-aware model. Sen- sors Vol.19, No.24. DOI: https://doi.org/10.3390/s19245429
  8. Hochreiter, S. and Schmidhuber, J. 1997. Long short-term memory. Neural Computa- tion Vol.9, pp.1735–1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735
  9. Huu, P., Minh, Q., and The, H. 2020. An ann-based gesture recognition algorithm for smart- home applications. Ksii Transactions on Internet and Information Systems Vol.14, No.5. DOI: https://doi.org/10.3837/tiis.2020.05.006
  10. Joshi, G., Singh, S., and Vig, R. 2020. Taguchi-topsis based hog parameter selection for complex background sign language recognition. Journal of Visual Communication and Image Representation Vol.71, No.102834. DOI: https://doi.org/10.1016/j.jvcir.2020.102834
  11. Koller, O., Zargaran, O., Ney, H., and Bowden, R. 2016. Deep sign: Hybrid cnnhmm for continuous sign language recognition. In British Machine Vision Conference. 136.1–136.12. Kopuklu, O., Gunduz, A., Kose, N., and Rigoll, G. 2019. Real-time hand gesture detection and classification using convolutional neural networks. In IEEE International Conference
  12. on Automatic Face and Gesture Recognition. 1–8.
  13. Kumar, P., Gauba, H., Roy, P., and Dogra, D. 2016. Coupled hmm-based multisensor data fusion for sign language recognition. Pattern Recognition Letters Vol.86, pp.1–8. DOI: https://doi.org/10.1016/j.patrec.2016.12.004
  14. Liao, Y., Xiong, P., Min, W., Min, W., and Lu, J. 2019. Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access Vol.7, pp.38044– 38054. DOI: https://doi.org/10.1109/ACCESS.2019.2904749
  15. Lipton, Z. 2015. A critical review of recurrent neural networks for sequence learning.
  16. arXiv:1506.00019 .
  17. Mazhar, O., Ramdani, S., and Cherubini, A. 2021. A deep learning framework for re- cognizing both static and dynamic gestures. Sensors Vol.21, No.2227. DOI: https://doi.org/10.3390/s21062227
  18. Molchanov, P., Gupta, S., Kim, K., and Kautz, J. 2015. Hand gesture recognition with 3d convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–7. DOI: https://doi.org/10.1109/CVPRW.2015.7301342
  19. Naidu, C. and Ghotkar, A. 2016. Hand gesture recognition using leap motion controller.
  20. International Journal of Science and Research Vol.5, No.10.
  21. Ordonez, F. and Roggen, D. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors Vol.16, No.1. DOI: https://doi.org/10.3390/s16010115
  22. Ravi, S., Suman, M., Kishore, P., Kumar, E., Kumar, M., and Kumar, D. 2019. Multi modal spatio temporal co-trained cnns with single modal testing on rgb–d based sign lan- guage gesture recognition. Journal of Computer Languages Vol.52, pp.88–102. DOI: https://doi.org/10.1016/j.cola.2019.04.002
  23. Sainath, T., Vinyals, O., Senior, A., and Sak, H. 2015. Convolutional, long shortterm memory, fully connected deep neural networks. In International Conference on Acoustics, Speech and Signal Processing. pp. 4580–4584. DOI: https://doi.org/10.1109/ICASSP.2015.7178838
  24. Simonyan, K. and Zisserman, A. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 .
  25. Sridhar, A., Ganesan, R., Kumar, P., and Khapra, M. 2020. Include: A large scale dataset for indian sign language recognition. In 28th ACM International Conference on Multimedia. DOI: https://doi.org/10.1145/3394171.3413528
  26. pp. 1366–1375.
  27. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., and Anguelov, D. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recogni- tion. pp. 1–9. DOI: https://doi.org/10.1109/CVPR.2015.7298594
  28. Uchil, A., Jha, S., and Sudha, B. 2019. Vision based deep learning approach for dynamic indian sign language recognition in healthcare. In Computational Vision and Bio-Inspired Computing. pp. 371–383. DOI: https://doi.org/10.1007/978-3-030-37218-7_43
  29. Yang, S. and Zhu, Q. 2017. Continuous chinese sign language recognition with cnn-lstm. In DOI: https://doi.org/10.1117/12.2281671
  30. International Conference on Digital Image Processing.
  31. Zengeler, N., Kopinski, T., and Handmann, U. 2019. Hand gesture recognition in automo- tive human-machine interaction using depth cameras. Sensors Vol.19, pp.1–27. DOI: https://doi.org/10.3390/s19010059
  32. Zimmermann, T., Taetz, B., and Bleser, G. 2018. Imu-to-segment assignment and orienta- tion alignment for the lower body using deep learning. Sensors Vol.18, No.1. DOI: https://doi.org/10.3390/s18010302