Activity Recognition applications from Contextual Video-Text Fusion

被引:1
|
作者
Levchuk, Georgiy [1 ]
Shabarekh, Charlotte [1 ]
机构
[1] Aptima Inc, Woburn, MA 01801 USA
关键词
D O I
10.1109/WACVW.2015.12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a demonstration of our capabilities in fusing information extracted from correlated video and text documents. We generate a probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. We manage uncertainty that arises from non-overlapping (conflicting) features in the sources by maintaining multiple hypotheses. In work on synthetic data that have few overlapping features between sources, we have shown that our method of soft fusion has increased activity recognition scores over both single source processing and non-probabilistic (hard) fusion. When sources have over 60% overlapping features, hard fusion outperforms single source and soft fusion. Our approach is flexible to determine whether soft or hard fusion is appropriate for a dataset and selects the correct fusion algorithm to yield the highest activity recognition results.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [21] Animating Images to Transfer CLIP for Video-Text Retrieval
    Liu, Yu
    Chen, Huai
    Huang, Lianghua
    Chen, Di
    Wang, Bin
    Pan, Pan
    Wang, Lisheng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1906 - 1911
  • [22] VTC: Improving Video-Text Retrieval with User Comments
    Hanu, Laura
    Thewlis, James
    Asano, Yuki M.
    Rupprecht, Christian
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 616 - 633
  • [23] Video Question Answering with Iterative Video-Text Co-tokenization
    Piergiovanni, A. J.
    Morton, Kairo
    Kuo, Weicheng
    Ryoo, Michael S.
    Angelova, Anelia
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 76 - 94
  • [24] Bridging Video-text Retrieval with Multiple Choice Questions
    Ge, Yuying
    Ge, Yixiao
    Liu, Xihui
    Li, Dian
    Shan, Ying
    Qie, Xiaohu
    Luo, Ping
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16146 - 16155
  • [25] Guided Graph Attention Learning for Video-Text Matching
    Li, Kunpeng
    Liu, Chang
    Stopa, Mike
    Amano, Jun
    Fu, Yun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [26] Survey on Video-Text Cross-Modal Retrieval
    Chen, Lei
    Xi, Yimeng
    Liu, Libo
    Computer Engineering and Applications, 2024, 60 (04) : 1 - 20
  • [27] SViTT: Temporal Learning of Sparse Video-Text Transformers
    Li, Yi
    Min, Kyle
    Tripathi, Subarna
    Vasconcelos, Nuno
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18919 - 18929
  • [28] HANet: Hierarchical Alignment Networks for Video-Text Retrieval
    Wu, Peng
    He, Xiangteng
    Tang, Mingqian
    Lv, Yiliang
    Liu, Jing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3518 - 3527
  • [29] Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
    Lin, Chengzhi
    Wu, Ancong
    Liang, Junwei
    Zhang, Jun
    Ge, Wenhang
    Zheng, Wei-Shi
    Shen, Chunhua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
    Hao, Xiaoshuai
    Zhang, Wanqian
    Wu, Dayan
    Zhu, Fei
    Li, Bo
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18962 - 18972