Video Visual Relation Detection via Multi-modal Feature Fusion

被引:32
|
作者
Sun, Xu [1 ,2 ]
Ren, Tongwei [1 ,2 ]
Zi, Yuan [1 ]
Wu, Gangshan [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Nanjing Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
Video visual relation detection; object trajectory detection; relation prediction;
D O I
10.1145/3343031.3356076
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video visual relation detection is a meaningful research problem, which aims to build a bridge between dynamic vision and language. In this paper, we propose a novel video visual relation detection method with multi-model feature fusion. First, we detect objects on each frame densely with the state-of-the-art video object detection model, flow-guided feature aggregation (FGFA), and generate object trajectories by linking the temporally independent objects with Seq-NMS and KCF tracker. Next, we break the relation candidates, i.e., co-occurrent object trajectory pairs, into short-term segments and predict relations with spatial-temporal feature and language context feature. Finally, we greedily associate the short-term relation segments into complete relation instances. The experiment results show that our proposed method outperforms other methods by a large margin, which also earned us the first place in visual relation detection task of Video Relation Understanding Challenge (VRU), ACMMM 2019.
引用
收藏
页码:2657 / 2661
页数:5
相关论文
共 50 条
  • [31] Multi-modal feature fusion for 3D object detection in the production workshop
    Hou, Rui
    Chen, Guangzhu
    Han, Yinhe
    Tang, Zaizuo
    Ru, Qingjun
    APPLIED SOFT COMPUTING, 2022, 115
  • [32] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [33] Multi-modal multi-task feature fusion for RGBT tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    INFORMATION FUSION, 2023, 97
  • [34] Multi-modal Fusion
    Liu, Huaping
    Hussain, Amir
    Wang, Shuliang
    INFORMATION SCIENCES, 2018, 432 : 462 - 462
  • [35] Multi-modal semantics fusion model for domain relation extraction via information bottleneck
    Tian, Zhao
    Zhao, Xuan
    Li, Xiwang
    Ma, Xiaoping
    Li, Yinghao
    Wang, Youwei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [36] Multi-modal brain image fusion using multi feature guided fusion network
    Shibu, Tom Michael
    Madan, Niranjan
    Paramanandham, Nirmala
    Kumar, Aakash
    Santosh, Ashwin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [37] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
    Yu, Fengning
    Lian, Jing
    Li, Linhui
    Zhao, Jian
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [38] Disease Classification Model Based on Multi-Modal Feature Fusion
    Wan, Zhengyu
    Shao, Xinhui
    IEEE ACCESS, 2023, 11 : 27536 - 27545
  • [39] Fabric image retrieval based on multi-modal feature fusion
    Ning Zhang
    Yixin Liu
    Zhongjian Li
    Jun Xiang
    Ruru Pan
    Signal, Image and Video Processing, 2024, 18 : 2207 - 2217
  • [40] Joint and Individual Feature Fusion Hashing for Multi-modal Retrieval
    Yu, Jun
    Zheng, Yukun
    Wang, Yinglin
    Li, Zuhe
    Zhu, Liang
    COGNITIVE COMPUTATION, 2023, 15 (03) : 1053 - 1064