Fuzzy Multimodal Graph Reasoning for Human-Centric Instructional Video Grounding

被引:0
|
作者
Li, Yujie [1 ]
Jiang, Xun [2 ,3 ]
Xu, Xing [3 ,4 ,5 ]
Lu, Huimin [6 ]
Tao Shen, Heng [3 ,4 ,5 ]
机构
[1] Kyushu Inst Technol, Fukuoka 8048550, Japan
[2] Univ Elect Sci & Technol China, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[4] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Peoples R China
[5] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China
[6] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Grounding; Feature extraction; Cognition; Task analysis; Visualization; Fuzzy systems; Education; Fuzzy logic; graph learning; human-centric video understanding; temporal grounding; NETWORK;
D O I
10.1109/TFUZZ.2024.3436030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-centric instructional videos provide opportunities for users to learn real-world multistep tasks, such as cooking, makeup, and using professional tools. However, these lengthy videos always lead to a tedious learning experience, making it challenging for learners to catch specific guidance efficiently. In this article, we present a novel approach, named fuzzy multimodal graph reasoning (FMGR), to extract target events in long untrimmed human-centric instructional videos using natural language. Specifically, we devise a fuzzy multimodal graph learning layers in our method, which encompass first contextual graph reasoning that transforms the individual features into contextualized features, second cross-modal relation fuzzifier that models the fine-grained matching relationships between two modalities, and third fuzzy graph reasoning that conducts massage passing among cross-modal matching node pairs. Particularly, we integrate fuzzy theory into the cross-modal relation fuzzifier to amplify potential matching pairs, while simultaneously mitigating the interference from ambiguous matches. To validate our method, we conducted evaluations on two human-centric instructional video datasets, i.e., MedVidQA and YouMakeUp. Moreover, we also take further analysis on the impacts of interrogative and declarative queries. Extensive experimental results and further analysis reveal the effectiveness of our proposed FMGR method.
引用
收藏
页码:5046 / 5059
页数:14
相关论文
共 39 条
  • [1] Human-Centric Spatio-Temporal Video Grounding With Visual Transformers
    Tang, Zongheng
    Liao, Yue
    Liu, Si
    Li, Guanbin
    Jin, Xiaojie
    Jiang, Hongxu
    Yu, Qian
    Xu, Dong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8238 - 8249
  • [2] Design of human-centric adaptive multimodal interfaces
    Kong, J.
    Zhang, W. Y.
    Yu, N.
    Xia, X. J.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2011, 69 (12) : 854 - 869
  • [3] Music Conditioned Generation for Human-Centric Video
    Zhao, Zimeng
    Zuo, Binghui
    Wang, Yangang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 506 - 510
  • [4] Toward human-centric deep video understanding
    Zeng, Wenjun
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
  • [5] Matching and Localizing: A Simple yet Effective Framework for Human-Centric Spatio-Temporal Video Grounding
    Tan, Chaolei
    Hu, Jian-Fang
    Zheng, Wei-Shi
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13604 LNAI : 305 - 316
  • [6] Matching and Localizing: A Simple yet Effective Framework for Human-Centric Spatio-Temporal Video Grounding
    Tan, Chaolei
    Hu, Jian-Fang
    Zheng, Wei-Shi
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 305 - 316
  • [7] Human-Centric Navigation System Video Vortex for Video Retrieval
    Haseyama, Miki
    Ogawa, Takahiro
    IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), 2011, : 167 - 168
  • [8] Human-Centric Autonomous Systems With LLMs for User Command Reasoning
    Yang, Yi
    Zhang, Qingwen
    Li, Ci
    Marta, Daniel Simoes
    Batool, Nazre
    Folkesson, John
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 988 - 994
  • [9] Human-centric multimodal fusion network for robust action recognition
    Hu, Zesheng
    Xiao, Jian
    Li, Le
    Liu, Cun
    Ji, Genlin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 239
  • [10] Human-centric multimodal deep (HMD) traffic signal control
    Wang, Leizhen
    Ma, Zhenliang
    Dong, Changyin
    Wang, Hao
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (04) : 744 - 753