LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models

被引:0
|
作者
Rossetto, Luca [1 ]
Kyriakou, Athina [1 ]
Lange, Svenja [1 ]
Ruosch, Florian [1 ]
Wang, Ruijie [1 ]
Wardatzky, Kathrin [1 ]
Bernstein, Abraham [1 ]
机构
[1] Univ Zurich, Dept Informat, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Lifelogging; Lifelog Search Challenge; Multimodal Knowledge Graphs; Graph-based Retrieval; Multi-modal Retrieval; Vision-Language Models;
D O I
10.1145/3643489.3661127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the scope of the 7th Lifelog Search Challenge (LSC'24), we present the 4th iteration of LifeGraph, a multimodal knowledge-graph approach with data augmentations using Vision-Language Models (VLM). We extend the LifeGraph model presented in former LSC challenges by event-based clustering using temporal and spatial relations as well as information extracted from descriptions of Lifelog image captions produced by VLMs.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [31] ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling
    Oezsoy, Ege
    Pellegrini, Chantal
    Keicher, Matthias
    Navab, Nassir
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 455 - 465
  • [32] Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval
    Sun, Lina
    Li, Yewen
    Dong, Yumin
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 499 - 507
  • [33] 3VL: Using Trees to Improve Vision-Language Models' Interpretability
    Yellinek, Nir
    Karlinsky, Leonid
    Giryes, Raja
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 495 - 509
  • [34] Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
    Zheng, Kecheng
    Wu, Wei
    Feng, Ruili
    Zhu, Kai
    Liu, Jiawei
    Zhao, Deli
    Zha, Zheng-Jun
    Chen, Wei
    Shen, Yujun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11629 - 11639
  • [35] The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
    Zhao, Qinyu
    Xu, Ming
    Gupta, Kartik
    Asthana, Akshay
    Zheng, Liang
    Gould, Stephen
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 127 - 142
  • [36] Utilizing Language Models to Expand Vision-Based Commonsense Knowledge Graphs
    Rezaei, Navid
    Reformat, Marek Z.
    SYMMETRY-BASEL, 2022, 14 (08):
  • [37] TEXT-IMAGE DE-CONTEXTUALIZATION DETECTION USING VISION-LANGUAGE MODELS
    Huang, Mingzhen
    Jia, Shan
    Chang, Ming-Ching
    Lyu, Siwei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8967 - 8971
  • [38] Urban Road Anomaly Monitoring Using Vision-Language Models for Enhanced Safety Management
    Ding, Hanyu
    Du, Yawei
    Xia, Zhengyu
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [39] Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
    Yu, Yu-Chu
    Huang, Chi-Pin
    Chen, Jr-Jen
    Chang, Kai-Po
    Lai, Yung-Hsuan
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 219 - 236
  • [40] EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
    Wang, Tiannan
    Zhou, Wangchunshu
    Zeng, Yan
    Zhang, Xinsong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 13899 - 13913