Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

被引:9
|
作者
Song, Sijie [1 ]
Lin, Xudong [2 ]
Liu, Jiaying [1 ]
Guo, Zongming [1 ]
Chang, Shih-Fu [2 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Columbia Univ, DVMM Lab, New York, NY USA
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.00140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. Unlike previous methods which solve the problem in multiple stages (i.e., tracking, proposal-based matching), we tackle the problem from a novel perspective, co-grounding, with an elegant one-stage framework. We enhance the single-frame grounding accuracy by semantic attention learning and improve the cross-frame grounding consistency with co-grounding feature learning. Semantic attention learning explicitly parses referring cues in different attributes to reduce the ambiguity in the complex expression. Co-grounding feature learning boosts visual feature representations by integrating temporal correlation to reduce the ambiguity caused by scene dynamics. Experiment results demonstrate the superiority of our framework on the video grounding datasets VID and LiOTB in generating accurate and stable results across frames. Our model is also applicable to referring expression comprehension in images, illustrated by the improved performance on the RefCOCO dataset.
引用
收藏
页码:1346 / 1355
页数:10
相关论文
共 22 条
  • [1] Referring Expression Comprehension by Composing Semantic-based Visual Attention
    Zhu, Zheng-An
    Chiang, Hsuan-Lun
    Chiang, Chen-Kuo
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 345 - 346
  • [2] Language-Attention Modular-Network for Relational Referring Expression Comprehension in Videos
    Dhingra, Naina
    Jain, Shipra
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4103 - 4110
  • [3] Referring Expression Comprehension via Co-attention and Visual Context
    Gao, Youming
    Ji, Yi
    Xu, Ting
    Xu, Yunlong
    Liu, Chunping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 119 - 130
  • [4] Stacked Attention Networks for Referring Expressions Comprehension
    Li, Yugang
    Sun, Haibo
    Chen, Zhe
    Ding, Yudan
    Zhou, Siqi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 65 (03): : 2529 - 2541
  • [5] Dynamic Graph Attention for Referring Expression Comprehension
    Yang, Sibei
    Li, Guanbin
    Yu, Yizhou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4643 - 4652
  • [6] Multi-level attention for referring expression comprehension
    Sun, Yanfeng
    Zhang, Yunru
    Jiang, Huajie
    Hu, Yongli
    Yin, Baocai
    PATTERN RECOGNITION LETTERS, 2023, 172 : 252 - 258
  • [7] MAttNet: Modular Attention Network for Referring Expression Comprehension
    Yu, Licheng
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Bansal, Mohit
    Berg, Tamara L.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1307 - 1315
  • [8] Referring Expression Comprehension Via Enhanced Cross-modal Graph Attention Networks
    Wang, Jia
    Ke, Jingcheng
    Shuai, Hong-Han
    Li, Yung-Hui
    Cheng, Wen-Huang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [9] CSRef: Contrastive Semantic Alignment for Speech Referring Expression Comprehension
    Huang, Lihong
    Zhong, Sheng-Hua
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON METHODOLOGIES FOR MULTIMEDIA 2024, MEET4MM 2024, 2024, : 28 - 34
  • [10] Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping
    Zhang, Chao
    Li, Weiming
    Ouyang, Wanli
    Wang, Qiang
    Kim, Woo-Shik
    Hong, Sunghoon
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1258 - 1266