Fuzzy Multimodal Graph Reasoning for Human-Centric Instructional Video Grounding

被引:0
|
作者
Li, Yujie [1 ]
Jiang, Xun [2 ,3 ]
Xu, Xing [3 ,4 ,5 ]
Lu, Huimin [6 ]
Tao Shen, Heng [3 ,4 ,5 ]
机构
[1] Kyushu Inst Technol, Fukuoka 8048550, Japan
[2] Univ Elect Sci & Technol China, Ctr Future Multimedia, Chengdu 611731, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[4] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Peoples R China
[5] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China
[6] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Grounding; Feature extraction; Cognition; Task analysis; Visualization; Fuzzy systems; Education; Fuzzy logic; graph learning; human-centric video understanding; temporal grounding; NETWORK;
D O I
10.1109/TFUZZ.2024.3436030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-centric instructional videos provide opportunities for users to learn real-world multistep tasks, such as cooking, makeup, and using professional tools. However, these lengthy videos always lead to a tedious learning experience, making it challenging for learners to catch specific guidance efficiently. In this article, we present a novel approach, named fuzzy multimodal graph reasoning (FMGR), to extract target events in long untrimmed human-centric instructional videos using natural language. Specifically, we devise a fuzzy multimodal graph learning layers in our method, which encompass first contextual graph reasoning that transforms the individual features into contextualized features, second cross-modal relation fuzzifier that models the fine-grained matching relationships between two modalities, and third fuzzy graph reasoning that conducts massage passing among cross-modal matching node pairs. Particularly, we integrate fuzzy theory into the cross-modal relation fuzzifier to amplify potential matching pairs, while simultaneously mitigating the interference from ambiguous matches. To validate our method, we conducted evaluations on two human-centric instructional video datasets, i.e., MedVidQA and YouMakeUp. Moreover, we also take further analysis on the impacts of interrogative and declarative queries. Extensive experimental results and further analysis reveal the effectiveness of our proposed FMGR method.
引用
收藏
页码:5046 / 5059
页数:14
相关论文
共 39 条
  • [21] Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space
    Han, Junwei
    Ji, Xiang
    Hu, Xintao
    Zhu, Dajiang
    Li, Kaiming
    Jiang, Xi
    Cui, Guangbin
    Guo, Lei
    Liu, Tianming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (07) : 2723 - 2736
  • [22] Introduction of human-centric AI assistant to aid radiologists for multimodal breast image classification
    Calisto, Francisco Maria
    Santiago, Carlos
    Nunes, Nuno
    Nascimento, Jacinto C.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2021, 150
  • [23] Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
    Baldrati, Alberto
    Morelli, Davide
    Cartella, Giuseppe
    Cornia, Marcella
    Bertini, Marco
    Cucchiara, Rita
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23336 - 23345
  • [24] Determining Interacting Objects in Human-Centric Activities via Qualitative Spatio-Temporal Reasoning
    Sokeh, Hajar Sadeghi
    Gould, Stephen
    Renz, Jochen
    COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 : 550 - 563
  • [25] HiEve: A Large-Scale Benchmark for Human-Centric Video Analysis in Complex Events
    Weiyao Lin
    Huabin Liu
    Shizhan Liu
    Yuxi Li
    Hongkai Xiong
    Guojun Qi
    Nicu Sebe
    International Journal of Computer Vision, 2023, 131 : 2994 - 3018
  • [26] HiEve: A Large-Scale Benchmark for Human-Centric Video Analysis in Complex Events
    Lin, Weiyao
    Liu, Huabin
    Liu, Shizhan
    Li, Yuxi
    Xiong, Hongkai
    Qi, Guojun
    Sebe, Nicu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (11) : 2994 - 3018
  • [27] Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-Based Recruitment
    Peña A.
    Serna I.
    Morales A.
    Fierrez J.
    Ortega A.
    Herrarte A.
    Alcantara M.
    Ortega-Garcia J.
    SN Computer Science, 4 (5)
  • [28] (sic) ECHO: A Visio-Linguistic Dataset for Event Causality Inference via Human-Centric ReasOning
    Xie, Yuxi
    Li, Guanzhen
    Kan, Min-Yen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4064 - 4085
  • [29] An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction
    Noghre, Ghazal Alinezhad
    Pazho, Armin Danesh
    Tabkhi, Hamed
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 995 - 1004
  • [30] Prioritization of Human-Centric and Sustainable City Criteria by Proportional Spherical Fuzzy Analytic Hierarchy Process
    Alkan, Nursah
    Kahraman, Cengiz
    SYMMETRY-BASEL, 2025, 17 (02):