Activity Recognition applications from Contextual Video-Text Fusion

被引:1
|
作者
Levchuk, Georgiy [1 ]
Shabarekh, Charlotte [1 ]
机构
[1] Aptima Inc, Woburn, MA 01801 USA
关键词
D O I
10.1109/WACVW.2015.12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a demonstration of our capabilities in fusing information extracted from correlated video and text documents. We generate a probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. We manage uncertainty that arises from non-overlapping (conflicting) features in the sources by maintaining multiple hypotheses. In work on synthetic data that have few overlapping features between sources, we have shown that our method of soft fusion has increased activity recognition scores over both single source processing and non-probabilistic (hard) fusion. When sources have over 60% overlapping features, hard fusion outperforms single source and soft fusion. Our approach is flexible to determine whether soft or hard fusion is appropriate for a dataset and selects the correct fusion algorithm to yield the highest activity recognition results.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [31] Adaptive Token Excitation with Negative Selection for Video-Text Retrieval
    Yu, Juntao
    Ni, Zhangkai
    Su, Taiyi
    Wang, Hanli
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 349 - 361
  • [32] Complementarity-Aware Space Learning for Video-Text Retrieval
    Zhu, Jinkuan
    Zeng, Pengpeng
    Gao, Lianli
    Li, Gongfu
    Liao, Dongliang
    Song, Jingkuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4362 - 4374
  • [33] LocVTP: Video-Text Pre-training for Temporal Localization
    Cao, Meng
    Yang, Tianyu
    Weng, Junwu
    Zhang, Can
    Wang, Jue
    Zou, Yuexian
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 38 - 56
  • [34] Uncertainty-Aware with Negative Samples for Video-Text Retrieval
    Song, Weitao
    Chen, Weiran
    Xu, Jialiang
    Ji, Yi
    Li, Ying
    Liu, Chunping
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 318 - 332
  • [35] Video-Text Pre-training with Learned Regions for Retrieval
    Yan, Rui
    Shou, Mike Zheng
    Ge, Yixiao
    Wang, Jinpeng
    Lin, Xudong
    Cai, Guanyu
    Tang, Jinhui
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3100 - 3108
  • [36] Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval
    Fang, Han
    Yang, Zhifei
    Zang, Xianghao
    Ban, Chao
    He, Zhongjiang
    Sun, Hao
    Zhou, Lanxiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3847 - 3856
  • [37] Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval
    Ma, Wentao
    Chen, Qingchao
    Zhou, Tongqing
    Zhao, Shan
    Cai, Zhiping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5486 - 5497
  • [38] Reliable Phrase Feature Mining for Hierarchical Video-Text Retrieval
    Lai, Huakai
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 12019 - 12031
  • [39] HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
    Liu, Song
    Fan, Haoqi
    Qian, Shengsheng
    Chen, Yiru
    Ding, Wenkui
    Wang, Zhongyuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11895 - 11905
  • [40] Expert-guided contrastive learning for video-text retrieval
    Lee, Jewook
    Lee, Pilhyeon
    Park, Sungho
    Byun, Hyeran
    NEUROCOMPUTING, 2023, 536 : 50 - 58