VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引:0
|
作者
Fei, Nanyi [1 ]
Jiang, Hao [2 ]
Lu, Haoyu [3 ]
Long, Jinqiang [3 ]
Dai, Yanqi [3 ]
Fan, Tuo [2 ]
Cao, Zhao [2 ]
Lu, Zhiwu [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷
基金
中国国家自然科学基金;
关键词
multi-modal model; multi-task learning; cross-modal search;
D O I
10.1007/978-3-031-56027-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.
引用
收藏
页码:56 / 72
页数:17
相关论文
共 50 条
  • [21] YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
    Ma, Le
    Wu, Xinda
    Tang, Ruiyuan
    Zhong, Chongjun
    Zhang, Kejun
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [22] YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
    Le Ma
    Xinda Wu
    Ruiyuan Tang
    Chongjun Zhong
    Kejun Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [23] A Multi-modal Multi-task based Approach for Movie Recommendation
    Raj, Subham
    Mondal, Prabir
    Chakder, Daipayan
    Saha, Sriparna
    Onoe, Naoyuki
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [24] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
    Yi, Jiangyan
    Tao, Jianhua
    Fu, Ruibo
    Wang, Tao
    Zhang, Chu Yuan
    Wang, Chenglong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
  • [25] Multi-task Multi-modal Models for Collective Anomaly Detection
    Ide, Tsuyoshi
    Phan, Dzung T.
    Kalagnanam, Jayant
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186
  • [26] Multi-modal multi-task feature fusion for RGBT tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    INFORMATION FUSION, 2023, 97
  • [27] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
    Cui, Xinyu
    Li, Yang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
  • [28] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
    Dong, Yipeng
    Luo, Wei
    Wang, Xiangyang
    Zhang, Lei
    Xu, Lin
    Zhou, Zehao
    Wang, Lulu
    SENSORS, 2025, 25 (01)
  • [29] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
    Xin, Yi
    Du, Junlong
    Wang, Qiang
    Yan, Ke
    Ding, Shouhong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
  • [30] Cloud Type Classification Using Multi-modal Information Based on Multi-task Learning
    Zhang, Yaxiu
    Xie, Jiazu
    He, Di
    Dong, Qing
    Zhang, Jiafeng
    Zhang, Zhong
    Liu, Shuang
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 119 - 125