Group Activity Recognition Based on Interaction Contextual Information in Videos Using Machine Learning

##plugins.themes.academic_pro.article.main##

SMITA SUNIL KULKARNI
Sangeeta Jadhav

Abstract

This paper is about recognizing multiple person actions occurring in videos, including individual actions, interactions,
and group activities. In an environment, multiple people perform group actions such as walking in groups
and talking by facing each other. The model develops by retrieving individual person action from video sequences
by representing interactive contextual features among multiple people. The novelty of the proposed framework
is the development of interactive action context descriptors (IAC) and classifying group activities using Machine
Learning. Each individual person and other nearby people’s relative action score are encoded by IAC in the
video frame. Individual person action descriptors are important clues for recognition of multiple person activity
by developing interaction context. An action retrieval technique was formulated based on KNN for individual
action classification scores. This model also introduces Fully Connected Conditional Random Field (FCCRF) to
learn interaction context information among multiple people. FCCRF regularizes activity categorization by the
spatial-temporal model. This paper also presents threshold processing to improve the performance of context
descriptors. The experimental results compared to state-of-the-art approaches and demonstrated improvement in
performance for group activity recognition.

##plugins.themes.academic_pro.article.details##

How to Cite
KULKARNI, S. S., & Jadhav, S. . (2022). Group Activity Recognition Based on Interaction Contextual Information in Videos Using Machine Learning. International Journal of Next-Generation Computing, 13(2). https://doi.org/10.47164/ijngc.v13i2.579

References

  1. Ryoo, M. S., and J. K. Aggarwal. "Recognition of high-level group activities based on activities of individual members." In 2008 IEEE Workshop on Motion and video Computing, pp. 1-8. IEEE, 2008. DOI: https://doi.org/10.1109/WMVC.2008.4544065
  2. Choi, Wongun, Khuram Shahid, and Silvio Savarese. "What are they doing?: Collective activity classification using spatio-temporal relationship among people." In 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pp. 1282-1289. IEEE, 2009.
  3. Lan, Tian, Yang Wang, Weilong Yang, Stephen N. Robinovitch, and Greg Mori. "Discriminative latent models for recognizing contextual group activities." IEEE transactions on pattern analysis and machine intelligence 34, no. 8 (2011): 1549-1562. DOI: https://doi.org/10.1109/TPAMI.2011.228
  4. Choi, Wongun, Khuram Shahid, and Silvio Savarese. "Learning context for collective activity recognition." In CVPR 2011, pp. 3273-3280. IEEE, 2011.
  5. Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "Viewpoint invariant collective activity recognition with relative action context." In European Conference on Computer Vision, pp. 253-262. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_26
  6. Lan, Tian, Yang Wang, Greg Mori, and Stephen N. Robinovitch. "Retrieving actions in group contexts." In European Conference on Computer Vision, pp. 181-194. Springer, Berlin, Heidelberg, 2010. DOI: https://doi.org/10.1007/978-3-642-35749-7_14
  7. Lan, Tian. "Beyond actions: Discriminative models for contextual group activities." PhD diss., Applied Science: School of Computing Science, 2010.
  8. Zhao, Chaoyang, Wei Fu, Jinqiao Wang, Xiao Bai, Qingshan Liu, and Hanqing Lu. "Discriminative context models for collective activity recognition." In 2014 22nd International Conference on Pattern Recognition, pp. 648-653. IEEE, 2014. DOI: https://doi.org/10.1109/ICPR.2014.122
  9. Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol. 1, pp. 886-893. Ieee, 2005.
  10. Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "Consistent collective activity recognition with fully connected CRFs." In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 2792-2795. IEEE, 2012.
  11. Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "A fully connected model for consistent collective activity recognition in videos." Pattern Recognition Letters 43 (2014): 109-118. DOI: https://doi.org/10.1016/j.patrec.2014.02.002
  12. Odashima, Shigeyuki, Masamichi Shimosaka, Takuhiro Kaneko, Rui Fukui, and Tomomasa Sato. "Collective activity localization by spatiality preservation search." Advanced Robotics 30, no. 11-12 (2016): 784-794. DOI: https://doi.org/10.1080/01691864.2016.1172506
  13. Khamis, Sameh, Vlad I. Morariu, and Larry S. Davis. "Combining per-frame and per-track cues for multi-person action recognition." In European Conference on Computer Vision, pp. 116-129. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33718-5_9
  14. Amer, Mohamed Rabie, Peng Lei, and Sinisa Todorovic. "Hirf: Hierarchical random field for collective activity recognition in videos." In European Conference on Computer Vision, pp. 572-585. Springer, Cham, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_37
  15. Antic, Borislav, and Björn Ommer. "Learning latent constituents for recognition of group activities in video." In European Conference on Computer Vision, pp. 33-47. Springer, Cham, 2014. DOI: https://doi.org/10.1007/978-3-319-10590-1_3
  16. Khamis, Sameh, Vlad I. Morariu, and Larry S. Davis. "A flow model for joint action recognition and identity maintenance." In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218-1225. IEEE, 2012.
  17. Odashima, Shigeyuki, Masamichi Shimosaka, Takuhiro Kaneko, Rui Fukui, and Tomomasa Sato. "Collective activity localization with contextual spatial pyramid." In European Conference on Computer Vision, pp. 243-252. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_25
  18. Tran, Khai N., Apurva Bedagkar-Gala, Ioannis A. Kakadiaris, and Shishir K. Shah. "Social Cues in Group Formation and Local Interactions for Collective Activity Analysis." In VISAPP (1), pp. 539-548. 2013.
  19. Chang, Xiaobin, Wei-Shi Zheng, and Jianguo Zhang. "Learning person–person interaction in collective activity recognition." IEEE Transactions on Image Processing 24, no. 6 (2015): 1905-1918. DOI: https://doi.org/10.1109/TIP.2015.2409564
  20. Li, Wenbo, Ming-Ching Chang, and Siwei Lyu. "Who did what at where and when: simultaneous multi-person tracking and activity recognition." arXiv preprint arXiv:1807.01253 (2018).
  21. Tang, Yongyi, Peizhen Zhang, Jian-Fang Hu, and Wei-Shi Zheng. "Latent embeddings for collective activity recognition." In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-6. IEEE, 2017. DOI: https://doi.org/10.1109/AVSS.2017.8078522