Group Activity Recognition Using Deep Autoencoder with Temporal Context Descriptor

##plugins.themes.academic_pro.article.main##

Safvan Vahora
N. C. Chauhan

Abstract

In this paper, we propose a novel method for group activity recognition in the video sequence. The problem of recognizing group activity requires information about individual person action, interaction potential, social cues bonding relationship of the people in the context region and analysis of this context region over a time period. We propose a deep architecture model, stacked deep autoencoder to provide a high-level representation of a group activity context descriptor, build at the top of local level human action pose feature. These local and global level representations of the group activity analyzed over a time period to build robust temporal group activity context descriptor. Our experimental results show the efficiency of the proposed approach over a benchmark collective activity dataset.

##plugins.themes.academic_pro.article.details##

How to Cite
Safvan Vahora, & N. C. Chauhan. (2018). Group Activity Recognition Using Deep Autoencoder with Temporal Context Descriptor. International Journal of Next-Generation Computing, 9(3), 221–232. https://doi.org/10.47164/ijngc.v9i3.150

References

  1. Aggarwal, J. and Ryoo, M. 2011. Human activity analysis. ACM Computing Surveys 43, 3 (apr), 1-43.
  2. Amer, M. R. and Todorovic, S. 2011. A chains model for localizing participants of group activities in videos. In 2011 International Conference on Computer Vision. IEEE.
  3. Bengio, Y. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1, 1-127.
  4. Biederman, I. 1981. On the semantics of a glance at a scene.
  5. Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. 2005. Actions as space-time shapes. In Tenth IEEE International Conference on Computer Vision (ICCV 05) Volume 1. IEEE.
  6. Chang, C.-C. and Lin, C.-J. 2011. LIBSVM. ACM Transactions on Intelligent Systems and Technology 2, 3 (apr), 1-27.
  7. Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing? : Collective activity classi cation using spatio-temporal relationship among people. In 2009 IEEE 12th Inter- national Conference on Computer Vision Workshops, ICCV Workshops. IEEE.
  8. Dalal, N. and Triggs, B. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05). IEEE.
  9. Deng, Z., Zhai, M., Chen, L., Liu, Y., Muralidharan, S., Roshtkhari, M. J., and Mori, G. 2015. Deep structured models for group activity recognition. In Procedings of the British Machine Vision Conference 2015. British Machine Vision Association.
  10. Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T., and Saenko, K. 2015. Long-term recurrent convolutional networks for visual recognition and description. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
  11. Hajimirsadeghi, H., Yan, W., Vahdat, A., and Mori, G. 2015. Visual recognition by counting instances: A multi-instance cardinality potential kernel. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
  12. Hinton, G. E. 2006. Reducing the dimensionality of data with neural networks. Sci- ence 313, 5786 (jul), 504-507.
  13. Hoiem, D., Efros, A., and Hebert, M. 2006. Putting objects in perspective. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR 06). IEEE.
  14. Hou, C., Nie, F., Li, X., Yi, D., and Wu, Y. 2014. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Transactions on Cyber- netics 44, 6 (jun), 793-804.
  15. Ibrahim, M. S., Muralidharan, S., Deng, Z., Vahdat, A., and Mori, G. 2016. A hier- archical deep temporal model for group activity recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
  16. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar- rama, S., and Darrell, T. 2014. Ca e. In Proceedings of the ACM International Conference on Multimedia - MM 14. ACM Press.
  17. Kaneko, T., Shimosaka, M., Odashima, S., Fukui, R., and Sato, T. 2012a. Consistent collective activity recognition with fully connected crfs. In Proceedings of the 21st Interna- tional Conference on Pattern Recognition (ICPR2012). 2792-2795.
  18. Kaneko, T., Shimosaka, M., Odashima, S., Fukui, R., and Sato, T. 2012b. Viewpoint invariant collective activity recognition with relative action context. In Computer Vision { ECCV 2012. Workshops and Demonstrations. Springer Berlin Heidelberg, 253-262.
  19. Kaneko, T., Shimosaka, M., Odashima, S., Fukui, R., and Sato, T. 2014. A fully con- nected model for consistent collective activity recognition in videos. Pattern Recognition Letters 43, 109-118.
  20. Kim, Y.-J., Cho, N.-G., and Lee, S.-W. 2014. Group activity recognition with group inter- action zone. In 2014 22nd International Conference on Pattern Recognition. IEEE.
  21. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2017. ImageNet classi cation with deep convolutional neural networks. Communications of the ACM 60, 6 (may), 84-90.
  22. Lan, T., Wang, Y., Mori, G., and Robinovitch, S. N. 2012. Retrieving actions in group contexts. In Trends and Topics in Computer Vision, K. N. Kutulakos, Ed. Springer Berlin Heidelberg, Berlin, Heidelberg, 181-194.
  23. Lan, T., Wang, Y., Yang, W., Robinovitch, S. N., and Mori, G. 2012. Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 8 (aug), 1549-1562.
  24. Nabi, M., Bue, A. D., and Murino, V. 2013. Temporal poselets for collective activity detection and recognition. In 2013 IEEE International Conference on Computer Vision Workshops. IEEE.
  25. Noceti, N. and Odone, F. 2014. Humans in groups: The importance of contextual information for understanding collective activities. Pattern Recognition 47, 11 (nov), 3535-3551.
  26. Schuldt, C., Laptev, I., and Caputo, B. 2004. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. IEEE.
  27. Shi, Y., Tian, Y., Wang, Y., and Huang, T. 2017. Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Transactions on Multimedia 19, 7 (jul), 1510-1520.
  28. Soomro, K., Zamir, A. R., and Shah, M. 2012. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 .
  29. Tran, K., Gala, A., Kakadiaris, I., and Shah, S. 2014. Activity analysis in crowded environments using social cues for group discovery and human interaction modeling. Pattern Recognition Letters 44, 49-57.
  30. Vahora, S. A. and Chauhan, N. C. 2017. A comprehensive study of group activity recognition methods in video. Indian Journal of Science and Technology 10, 23 (feb), 1-11.
  31. Vinciarelli, A., Pantic, M., and Bourlard, H. 2009. Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 12 (nov), 1743-1759.
  32. Yang, X. and Tian, Y. 2017. Super normal vector for human activity recognition with depth cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 5 (may), 1028-1039.
  33. Zeng, K., Yu, J., Wang, R., Li, C., and Tao, D. 2017. Coupled deep autoencoder for single image super-resolution. IEEE Transactions on Cybernetics 47, 1 (jan), 27-37.
  34. Zhu, Z., You, X., Chen, C. P., Tao, D., Ou, W., Jiang, X., and Zou, J. 2015. An adaptive hybrid pattern for noise-robust texture analysis. Pattern Recognition 48, 8 (aug), 2592-2608.