Enhanced video analysis framework for action detection using deep learning


Saylee Begampure
Parul Jadhav


Video Analytics analyzes the video content and adds brains to eyes that is analytics to camera. It extracts contents from the video by monitoring the video in real time. Normal and Abnormal human activity detection using deep learning models is a challenging task in computer vision. The detection of the same will help in detecting crime scenes which will help in preventing treacherous actions Proposed method focuses on classifying normal activities for humans in real time scenarios. The pre-processing technique for redundant frame detection, elimination and training the model efficiently using Convolutional Neural Network for classifying the activities is the main research contribution. Proposed method shows improvement in accuracy as compared to reference method which can be further implemented for on edge embedded platforms for real time applications.


How to Cite
Saylee Begampure, & Parul Jadhav. (2021). Enhanced video analysis framework for action detection using deep learning. International Journal of Next-Generation Computing, 12(2), 218–228. https://doi.org/10.47164/ijngc.v12i2.199


  1. Albawi, S., Mohammed, T. A., and Al-Zawi, S. 2017. Understanding of a convolutional neural network. International Conference on Engineering and Technology (ICET), Antalya Vol.63, No.1-6.
  2. Begampure, S. and Jadhav, P. 2019. Comprehensive review of generic object detection frameworks using deep learning approach. International conference on contemporary engineering and technology.
  3. B.Horn, J. and B.Schunck. 1981. Determining optical ow. IEEE Trans Pattern Analysis and Machine Intelligence Vol.17, No.185{203.
  4. Bobick, J. A. and Davis, J. 2001. The recognition of human movement using temporal templates.IEEE Transaction Pattern Analysis and Machine Intelligence Vol.23, No.257{267.
  5. E, B. 2019. Google colaboratory. in: Building machine learning and deep learning models on google cloud platform. Apress, Berkeley, CA.
  6. J. Arunnehrua, G. and Bharathi, S. P. 2018. Human action recognition using 3d convolutional neural networks with 3d motion cuboids in surveillance videos. International Conference on Robotics and Smart Manufacturing (RoSMa2018) ,Procedia Computer Science 133, No.471{477.
  7. Ji, S., Yang, W. X. M., and Yug, K. 2010. 3d convolutional neural networks for human action recognition. ICML.
  8. Laptev, I. 2005. On space-time interest points. IJCV Vol.64, No.107{123.
  9. Laptev, I. and Lindeberg, T. 2003. Space-time interest points. ICCV , No.432{439.
  10. Laptev, I. and Lindeberg, T. 2006. Local descriptors for spatio-temporal recognition. Spatial Coherence for Visual Motion Analysis Vol.3667.
  11. Lucas, B. D. and Kanade, T. 1981. An iterative image registration technique with an application to stereo vision. Imaging Understanding Workshop.
  12. Patino, L., Cane, T., Vallee, A., and Ferryman, J. 2016. Pets 2016: Dataset and challenge. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1240{1247.
  13. Schuldt, C., Laptev, I., and Caputo, B. 2004. Recognizing human actions: a local svm approach. Vol. Vol.3.
  14. Schuldt, C., Laptev, and Caputo. 2005. "kth dataset. Proc. ICPR'04, Cambridge, UK.
  15. Sherstinsky, A. March 2020. Fundamentals of recurrent neural network (rnn) and long shortterm memory (lstm) networks. Elsevier journal Physica D: Nonlinear Phenomena 404, pp.181{199.
  16. S.P.T., K. and J.L.U, G. 2019. Getting started with google cloud platform. in: Building your next big thing with google cloud platform. Apress, Berkeley, CA.
  17. Sultani, W., Chen, C., and Shah, M. 2018. Real-world anomaly detection in surveillance videos. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  18. Sun, J., S.Roth, and M.J.Black. 2010. Secrets of optical flow estimation and their principles. CVPR.
  19. Tan, C. and Sun, F. 2018. A survey on deep transfer learning. Arti cial Neural Networks and Machine Learning ICANN 11141.
  20. Taylor, G. W., Fergus, R., LeCun, Y., and Bregler, C. 2010. Convolutional learning of spatio-temporal features. ECCV .