Group Activity Recognition Based on Interaction Contextual Information in Videos Using Machine Learning
##plugins.themes.academic_pro.article.main##
Abstract
This paper is about recognizing multiple person actions occurring in videos, including individual actions, interactions,
and group activities. In an environment, multiple people perform group actions such as walking in groups
and talking by facing each other. The model develops by retrieving individual person action from video sequences
by representing interactive contextual features among multiple people. The novelty of the proposed framework
is the development of interactive action context descriptors (IAC) and classifying group activities using Machine
Learning. Each individual person and other nearby people’s relative action score are encoded by IAC in the
video frame. Individual person action descriptors are important clues for recognition of multiple person activity
by developing interaction context. An action retrieval technique was formulated based on KNN for individual
action classification scores. This model also introduces Fully Connected Conditional Random Field (FCCRF) to
learn interaction context information among multiple people. FCCRF regularizes activity categorization by the
spatial-temporal model. This paper also presents threshold processing to improve the performance of context
descriptors. The experimental results compared to state-of-the-art approaches and demonstrated improvement in
performance for group activity recognition.
##plugins.themes.academic_pro.article.details##
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
- Ryoo, M. S., and J. K. Aggarwal. "Recognition of high-level group activities based on activities of individual members." In 2008 IEEE Workshop on Motion and video Computing, pp. 1-8. IEEE, 2008. DOI: https://doi.org/10.1109/WMVC.2008.4544065
- Choi, Wongun, Khuram Shahid, and Silvio Savarese. "What are they doing?: Collective activity classification using spatio-temporal relationship among people." In 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pp. 1282-1289. IEEE, 2009.
- Lan, Tian, Yang Wang, Weilong Yang, Stephen N. Robinovitch, and Greg Mori. "Discriminative latent models for recognizing contextual group activities." IEEE transactions on pattern analysis and machine intelligence 34, no. 8 (2011): 1549-1562. DOI: https://doi.org/10.1109/TPAMI.2011.228
- Choi, Wongun, Khuram Shahid, and Silvio Savarese. "Learning context for collective activity recognition." In CVPR 2011, pp. 3273-3280. IEEE, 2011.
- Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "Viewpoint invariant collective activity recognition with relative action context." In European Conference on Computer Vision, pp. 253-262. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_26
- Lan, Tian, Yang Wang, Greg Mori, and Stephen N. Robinovitch. "Retrieving actions in group contexts." In European Conference on Computer Vision, pp. 181-194. Springer, Berlin, Heidelberg, 2010. DOI: https://doi.org/10.1007/978-3-642-35749-7_14
- Lan, Tian. "Beyond actions: Discriminative models for contextual group activities." PhD diss., Applied Science: School of Computing Science, 2010.
- Zhao, Chaoyang, Wei Fu, Jinqiao Wang, Xiao Bai, Qingshan Liu, and Hanqing Lu. "Discriminative context models for collective activity recognition." In 2014 22nd International Conference on Pattern Recognition, pp. 648-653. IEEE, 2014. DOI: https://doi.org/10.1109/ICPR.2014.122
- Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol. 1, pp. 886-893. Ieee, 2005.
- Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "Consistent collective activity recognition with fully connected CRFs." In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 2792-2795. IEEE, 2012.
- Kaneko, Takuhiro, Masamichi Shimosaka, Shigeyuki Odashima, Rui Fukui, and Tomomasa Sato. "A fully connected model for consistent collective activity recognition in videos." Pattern Recognition Letters 43 (2014): 109-118. DOI: https://doi.org/10.1016/j.patrec.2014.02.002
- Odashima, Shigeyuki, Masamichi Shimosaka, Takuhiro Kaneko, Rui Fukui, and Tomomasa Sato. "Collective activity localization by spatiality preservation search." Advanced Robotics 30, no. 11-12 (2016): 784-794. DOI: https://doi.org/10.1080/01691864.2016.1172506
- Khamis, Sameh, Vlad I. Morariu, and Larry S. Davis. "Combining per-frame and per-track cues for multi-person action recognition." In European Conference on Computer Vision, pp. 116-129. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33718-5_9
- Amer, Mohamed Rabie, Peng Lei, and Sinisa Todorovic. "Hirf: Hierarchical random field for collective activity recognition in videos." In European Conference on Computer Vision, pp. 572-585. Springer, Cham, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_37
- Antic, Borislav, and Björn Ommer. "Learning latent constituents for recognition of group activities in video." In European Conference on Computer Vision, pp. 33-47. Springer, Cham, 2014. DOI: https://doi.org/10.1007/978-3-319-10590-1_3
- Khamis, Sameh, Vlad I. Morariu, and Larry S. Davis. "A flow model for joint action recognition and identity maintenance." In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218-1225. IEEE, 2012.
- Odashima, Shigeyuki, Masamichi Shimosaka, Takuhiro Kaneko, Rui Fukui, and Tomomasa Sato. "Collective activity localization with contextual spatial pyramid." In European Conference on Computer Vision, pp. 243-252. Springer, Berlin, Heidelberg, 2012. DOI: https://doi.org/10.1007/978-3-642-33885-4_25
- Tran, Khai N., Apurva Bedagkar-Gala, Ioannis A. Kakadiaris, and Shishir K. Shah. "Social Cues in Group Formation and Local Interactions for Collective Activity Analysis." In VISAPP (1), pp. 539-548. 2013.
- Chang, Xiaobin, Wei-Shi Zheng, and Jianguo Zhang. "Learning person–person interaction in collective activity recognition." IEEE Transactions on Image Processing 24, no. 6 (2015): 1905-1918. DOI: https://doi.org/10.1109/TIP.2015.2409564
- Li, Wenbo, Ming-Ching Chang, and Siwei Lyu. "Who did what at where and when: simultaneous multi-person tracking and activity recognition." arXiv preprint arXiv:1807.01253 (2018).
- Tang, Yongyi, Peizhen Zhang, Jian-Fang Hu, and Wei-Shi Zheng. "Latent embeddings for collective activity recognition." In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-6. IEEE, 2017. DOI: https://doi.org/10.1109/AVSS.2017.8078522