VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引：0

作者：

Fei, Nanyi ^{[1
]}

Jiang, Hao ^{[2
]}

Lu, Haoyu ^{[3
]}

Long, Jinqiang ^{[3
]}

Dai, Yanqi ^{[3
]}

Fan, Tuo ^{[2
]}

Cao, Zhao ^{[2
]}

Lu, Zhiwu ^{[3
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China

[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷

基金：

中国国家自然科学基金;

关键词：

multi-modal model; multi-task learning; cross-modal search;

D O I：

10.1007/978-3-031-56027-9_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.

引用

页码：56 / 72

页数：17

共 50 条

[21] YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
Ma, Le
Wu, Xinda
Tang, Ruiyuan
Zhong, Chongjun
Zhang, Kejun
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[22] YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
Le Ma
Xinda Wu
Ruiyuan Tang
Chongjun Zhong
Kejun Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[23] A Multi-modal Multi-task based Approach for Movie Recommendation
Raj, Subham
Mondal, Prabir
Chakder, Daipayan
Saha, Sriparna
Onoe, Naoyuki
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[24] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
Yi, Jiangyan
Tao, Jianhua
Fu, Ruibo
Wang, Tao
Zhang, Chu Yuan
Wang, Chenglong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
[25] Multi-task Multi-modal Models for Collective Anomaly Detection
Ide, Tsuyoshi
Phan, Dzung T.
Kalagnanam, Jayant
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186
[26] Multi-modal multi-task feature fusion for RGBT tracking
Cai, Yujue
Sui, Xiubao
Gu, Guohua
INFORMATION FUSION, 2023, 97
[27] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
Cui, Xinyu
Li, Yang
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
[28] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
Dong, Yipeng
Luo, Wei
Wang, Xiangyang
Zhang, Lei
Xu, Lin
Zhou, Zehao
Wang, Lulu
SENSORS, 2025, 25 (01)
[29] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Xin, Yi
Du, Junlong
Wang, Qiang
Yan, Ke
Ding, Shouhong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
[30] Cloud Type Classification Using Multi-modal Information Based on Multi-task Learning
Zhang, Yaxiu
Xie, Jiazu
He, Di
Dong, Qing
Zhang, Jiafeng
Zhang, Zhong
Liu, Shuang
COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 119 - 125

← 1 2 3 4 5 →