LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models

被引:0
|
作者
Rossetto, Luca [1 ]
Kyriakou, Athina [1 ]
Lange, Svenja [1 ]
Ruosch, Florian [1 ]
Wang, Ruijie [1 ]
Wardatzky, Kathrin [1 ]
Bernstein, Abraham [1 ]
机构
[1] Univ Zurich, Dept Informat, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Lifelogging; Lifelog Search Challenge; Multimodal Knowledge Graphs; Graph-based Retrieval; Multi-modal Retrieval; Vision-Language Models;
D O I
10.1145/3643489.3661127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the scope of the 7th Lifelog Search Challenge (LSC'24), we present the 4th iteration of LifeGraph, a multimodal knowledge-graph approach with data augmentations using Vision-Language Models (VLM). We extend the LifeGraph model presented in former LSC challenges by event-based clustering using temporal and spatial relations as well as information extracted from descriptions of Lifelog image captions produced by VLMs.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [1] Layerwised multimodal knowledge distillation for vision-language pretrained model
    Wang, Jin
    Liao, Dawei
    Zhang, You
    Xu, Dan
    Zhang, Xuejie
    NEURAL NETWORKS, 2024, 175
  • [2] Towards Multimodal Disinformation Detection by Vision-language Knowledge Interaction
    Li, Qilei
    Gao, Mingliang
    Zhang, Guisheng
    Zhai, Wenzhe
    Chen, Jinyong
    Jeon, Gwanggil
    INFORMATION FUSION, 2024, 102
  • [3] Multimodal Search on Iconclass using Vision-Language Pre-Trained Models
    Santini, Cristian
    Posthumus, Etienne
    Tietz, Tabea
    Tan, Mary Ann
    Bruns, Oleksandra
    Sack, Harald
    2023 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, JCDL, 2023, : 285 - 287
  • [4] Continual Vision-Language Retrieval via Dynamic Knowledge Rectification
    Cui, Zhenyu
    Peng, Yuxin
    Wang, Xun
    Zhu, Manyu
    Zhou, Jiahuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11704 - 11712
  • [5] Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
    Wang, Fei
    Ding, Liang
    Rao, Jun
    Liu, Ye
    Shen, Li
    Ding, Changxing
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
  • [6] Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
    Dai, Wenliang
    Hou, Lu
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2383 - 2395
  • [7] Transferable Multimodal Attack on Vision-Language Pre-training Models
    Wang, Haodi
    Dong, Kai
    Zhu, Zhilei
    Qin, Haotong
    Liu, Aishan
    Fang, Xiaolin
    Wang, Jiakai
    Liu, Xianglong
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 1722 - 1740
  • [8] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
    Li, Xin
    Lian, Dongze
    Lu, Zhihe
    Bai, Jiawang
    Chen, Zhibo
    Wang, Xinchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Adapting Vision-Language Models via Learning to Inject Knowledge
    Xuan, Shiyu
    Yang, Ming
    Zhang, Shiliang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5798 - 5809
  • [10] cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
    Gupta, Kshitij
    Gautam, Devansh
    Mamidi, Radhika
    Proceedings - International Conference on Pattern Recognition, 2022, 2022-August : 1734 - 1741