LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models

被引:0
|
作者
Rossetto, Luca [1 ]
Kyriakou, Athina [1 ]
Lange, Svenja [1 ]
Ruosch, Florian [1 ]
Wang, Ruijie [1 ]
Wardatzky, Kathrin [1 ]
Bernstein, Abraham [1 ]
机构
[1] Univ Zurich, Dept Informat, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Lifelogging; Lifelog Search Challenge; Multimodal Knowledge Graphs; Graph-based Retrieval; Multi-modal Retrieval; Vision-Language Models;
D O I
10.1145/3643489.3661127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the scope of the 7th Lifelog Search Challenge (LSC'24), we present the 4th iteration of LifeGraph, a multimodal knowledge-graph approach with data augmentations using Vision-Language Models (VLM). We extend the LifeGraph model presented in former LSC challenges by event-based clustering using temporal and spatial relations as well as information extracted from descriptions of Lifelog image captions produced by VLMs.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [21] Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
    Ye, Shuquan
    Xie, Yujia
    Chen, Dongdong
    Xu, Yichong
    Yuan, Lu
    Zhu, Chenguang
    Liao, Jing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2634 - 2645
  • [22] ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs
    Yu, Fei
    Tang, Jiji
    Yin, Weichong
    Su, Yu
    Tian, Hao
    Wu, Hua
    Wang, Haifeng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3208 - 3216
  • [23] Learning the Visualness of Text Using Large Vision-Language Models
    Verma, Gaurav
    Rossi, Ryan A.
    Tensmeyer, Christopher
    Gu, Jiuxiang
    Nenkova, Ani
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2394 - 2408
  • [24] Modeling Multimodal Uncertainties via Probability Distribution Encoders Included Vision-Language Models
    Wang, Junjie
    Ji, Yatai
    Zhang, Yuxiang
    Zhu, Yanru
    Sakai, Tetsuya
    IEEE ACCESS, 2024, 12 : 420 - 434
  • [25] Multimodal alignment augmentation transferable attack on vision-language pre-training models
    Fu, Tingchao
    Zhang, Jinhong
    Li, Fanxiao
    Wei, Ping
    Zeng, Xianglong
    Zhou, Wei
    PATTERN RECOGNITION LETTERS, 2025, 191 : 131 - 137
  • [26] UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
    Li, Chuanhao
    Li, Zhen
    Jing, Chenchen
    Liu, Shuo
    Shao, Wenqi
    Wu, Yuwei
    Luo, Ping
    Qiao, Yu
    Zhang, Kaipeng
    arXiv,
  • [27] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Du, Tianyu
    Zhu, Jinguo
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
    Nirala, Ashutosh
    Joshi, Ameya
    Sarkar, Soumik
    Hegde, Chinmay
    IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 252 - 271
  • [29] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
    Gu, Zhaopeng
    Zhu, Bingke
    Zhu, Guibo
    Chen, Yingying
    Tang, Ming
    Wang, Jinqiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 3, 2024, : 1932 - 1940
  • [30] CancerFusionPrompt: A Novel Framework for Multimodal Cancer Subtype Classification Using Vision-Language Model
    Liu, Ruonan
    Ayoub, Muhammad
    Wahid, Junaid Abdul
    EXPERT SYSTEMS, 2025, 42 (05)