Activity Recognition applications from Contextual Video-Text Fusion

被引:1
|
作者
Levchuk, Georgiy [1 ]
Shabarekh, Charlotte [1 ]
机构
[1] Aptima Inc, Woburn, MA 01801 USA
关键词
D O I
10.1109/WACVW.2015.12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a demonstration of our capabilities in fusing information extracted from correlated video and text documents. We generate a probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. We manage uncertainty that arises from non-overlapping (conflicting) features in the sources by maintaining multiple hypotheses. In work on synthetic data that have few overlapping features between sources, we have shown that our method of soft fusion has increased activity recognition scores over both single source processing and non-probabilistic (hard) fusion. When sources have over 60% overlapping features, hard fusion outperforms single source and soft fusion. Our approach is flexible to determine whether soft or hard fusion is appropriate for a dataset and selects the correct fusion algorithm to yield the highest activity recognition results.
引用
收藏
页码:1 / 3
页数:3
相关论文
共 50 条
  • [1] Video-text extraction and recognition
    Chen, TB
    Ghosh, D
    Ranganath, S
    TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A319 - A322
  • [2] WILL VIDEO-TEXT SYSTEMS TRAVEL WELL
    不详
    ELECTRONICS, 1978, 51 (19): : 24 - 24
  • [3] Video-Text Compliance: Activity Verification Based on Natural Language Instructions
    Jaiswal, Mayoore S.
    Liu, Frank
    Jagannathan, Anupama
    Gattiker, Anne
    Hwang, Inseok
    Lee, Jinho
    Tong, Matt
    Dureja, Sahil
    Shah, Soham
    Hofstee, Peter
    Chen, Valerie
    Paul, Suvadip
    Feris, Rogerio
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1503 - 1512
  • [4] Alignment of Image-Text and Video-Text Datasets
    Ozkose, Yunus Emre
    Gokce, Zeynep
    Duygulu, Pinar
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [5] Learning Video-Text Aligned Representations for Video Captioning
    Shi, Yaya
    Xu, Haiyang
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [6] A NOVEL CONVOLUTIONAL ARCHITECTURE FOR VIDEO-TEXT RETRIEVAL
    Li, Zheng
    Guo, Caili
    Yang, Bo
    Feng, Zerun
    Zhang, Hao
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [7] Multi-event Video-Text Retrieval
    Zhang, Gengyuan
    Ren, Jisen
    Gu, Jindong
    Tresp, Volker
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22056 - 22066
  • [8] Deep learning for video-text retrieval: a review
    Zhu, Cunjuan
    Jia, Qi
    Chen, Wei
    Guo, Yanming
    Liu, Yu
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [9] Progressive Semantic Matching for Video-Text Retrieval
    Liu, Hongying
    Luo, Ruyi
    Shang, Fanhua
    Niu, Mantang
    Liu, Yuanyuan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5083 - 5091
  • [10] A Framework for Video-Text Retrieval with Noisy Supervision
    Vaseqi, Zahra
    Fan, Pengnan
    Clark, James
    Levine, Martin
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 373 - 383