Image Recall on Image-Text Intertwined Lifelogs

被引:6
|
作者
Chu, Tzu-Hsuan [1 ]
Huang, Hen-Hsen [2 ]
Chen, Hsin-Hsi [2 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] Natinal Chengchi Univ, MOST Joint Res Ctr Technol & All Vista Healthcare, Taipei, Taiwan
关键词
lifelogging; image retrieval; multimodal representation;
D O I
10.1145/3350546.3352555
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People engage in lifelogging by taking photos with cameras and cellphones anytime anywhere and share the photos, intertwined with captions or descriptions, on social media platforms. The image-text intertwined data provides richer information for image recall. When images cannot keep the complete information, the textual information is a complement to describe the life experiences under the photos. This work proposes a multimodal retrieval model for image recall in image-text intertwined lifelogs. Our Attentive Image-Story model combines an Image model, which transfers visual information and textual information to a single representation space, and a Story model, which captures text-based contextual information, with an attention mechanism to reduce the semantic gap between visual and textual information. Experimental results show our model outperforms a state-of-the-art image-based retrieval model and the image/text hybrid system.
引用
收藏
页码:398 / 402
页数:5
相关论文
共 50 条
  • [41] Evaluating Generative AI Models for Image-Text Modification
    Soni, Jayesh
    Upadhyay, Himanshu
    Victor, Prince Patrick Anand
    Tripathi, Sarvapriya
    IEEE ACCESS, 2025, 13 : 40703 - 40729
  • [42] Dynamic Modality Interaction Modeling for Image-Text Retrieval
    Qu, Leigang
    Liu, Meng
    Wu, Jianlong
    Gao, Zan
    Nie, Liqiang
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1104 - 1113
  • [43] Generating counterfactual negative samples for image-text matching
    Su, Xinqi
    Song, Dan
    Li, Wenhui
    Ren, Tongwei
    Liu, An-An
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [44] Image-Text Pre-Training for Logo Recognition
    Hubenthal, Mark
    Kumar, Suren
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1145 - 1154
  • [45] External Knowledge Dynamic Modeling for Image-text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Liu, Min
    Li, Xuanya
    Liu, Anan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5330 - 5338
  • [46] Asymmetric bi-encoder for image-text retrieval
    Xiong, Wei
    Liu, Haoliang
    Mi, Siya
    Zhang, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
  • [47] Multiview adaptive attention pooling for image-text retrieval
    Ding, Yunlai
    Yu, Jiaao
    Lv, Qingxuan
    Zhao, Haoran
    Dong, Junyu
    Li, Yuezun
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [48] Deriving Image-Text Document Surrogates to Optimize Cognition
    Koh, Eunyee
    Kerne, Andruid
    DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 84 - 93
  • [49] Learning hierarchical embedding space for image-text matching
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665
  • [50] Image-text coherence and its implications for multimodal AI
    Alikhani, Malihe
    Khalid, Baber
    Stone, Matthew
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6