Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引：0

作者：

Liu, Tengfei ^{[1
]}

Hu, Yongli ^{[1
]}

Gao, Junbin ^{[2
]}

Sun, Yanfeng ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China

[2] Univ Sydney, Sydney, NSW, Australia

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2024年 / 18卷 / 04期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;

D O I：

10.1145/3631711

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.

引用

页数：24

共 50 条

[31] Cross-modal attention network for retinal disease classification based on multi-modal images
Liu, Zirong
Hu, Yan
Qiu, Zhongxi
Niu, Yanyan
Zhou, Dan
Li, Xiaoling
Shen, Junyong
Jiang, Hongyang
Li, Heng
Liu, Jiang
BIOMEDICAL OPTICS EXPRESS, 2024, 15 (06): : 3699 - 3714
[32] Cross-modal discriminant adversarial network
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Lin, Jie
Zhen, Liangli
Wang, Wei
Peng, Dezhong
PATTERN RECOGNITION, 2021, 112
[33] Cross-modal attention fusion network for RGB-D semantic segmentation
Zhao, Qiankun
Wan, Yingcai
Xu, Jiqian
Fang, Lijin
NEUROCOMPUTING, 2023, 548
[34] Dual-stream cross-modal fusion alignment network for survival analysis
Song, Jinmiao
Hao, Yatong
Zhao, Shuang
Zhang, Peng
Feng, Qilin
Dai, Qiguo
Duan, Xiaodong
BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)
[35] Emotional computing based on cross-modal fusion and edge network data incentive
Ma, Lei
Ju, Feng
Wan, Jing
Shen, Xiaoyan
PERSONAL AND UBIQUITOUS COMPUTING, 2019, 23 (3-4) : 363 - 372
[36] DCMFNet: Deep Cross-Modal Fusion Network for Referring Image Segmentation with Iterative Gated Fusion
Huang, Zhen
Xue, Mingcheng
Liu, Yu
Xu, Kaiping
Li, Jiangquan
Yu, Chenyang
PROCEEDINGS OF THE 50TH GRAPHICS INTERFACE CONFERENCE, GI 2024, 2024,
[37] Unsupervised Deep Fusion Cross-modal Hashing
Huang, Jiaming
Min, Chen
Jing, Liping
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 358 - 366
[38] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
Duan, Zaipeng
Huang, Xiao
Ma, Jie
NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6361 - 6375
[39] A light-weight, efficient, and general cross-modal image fusion network
Fang, Aiqing
Zhao, Xinbo
Yang, Jiaqi
Qin, Beibei
Zhang, Yanning
NEUROCOMPUTING, 2021, 463 : 198 - 211
[40] Multi-grained Cross-Modal Feature Fusion Network for Diagnosis Prediction
An, Ying
Zhao, Zhenrui
Chen, Xianlai
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 221 - 232

← 1 2 3 4 5 →