Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引：0

作者：

Liu, Tengfei ^{[1
]}

Hu, Yongli ^{[1
]}

Gao, Junbin ^{[2
]}

Sun, Yanfeng ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China

[2] Univ Sydney, Sydney, NSW, Australia

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2024年 / 18卷 / 04期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;

D O I：

10.1145/3631711

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.

引用

页数：24

共 50 条

[41] CMFX: Cross-modal fusion network for RGB-X crowd counting
Duan, Xiao-Meng
Sun, Hong-Mei
Zhang, Zeng-Min
Qin, Ling-Xiao
Jia, Rui-Sheng
NEURAL NETWORKS, 2025, 184
[42] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
Zaipeng Duan
Xiao Huang
Jie Ma
Neural Processing Letters, 2023, 55 : 6361 - 6375
[43] Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval
Liu, Xin
Cheung, Yiu-ming
Hu, Zhikai
He, Yi
Zhong, Bineng
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (04): : 607 - 619
[44] Emotional computing based on cross-modal fusion and edge network data incentive
Lei Ma
Feng Ju
Jing Wan
Xiaoyan Shen
Personal and Ubiquitous Computing, 2019, 23 : 363 - 372
[45] Enhancing Stock Price Prediction with Deep Cross-Modal Information Fusion Network
Mandal, Rabi Chandra
Kler, Rajnish
Tiwari, Anil
Keshta, Ismail
Abonazel, Mohamed R.
Tageldin, Elsayed M.
Umaralievich, Mekhmonov Sultonali
FLUCTUATION AND NOISE LETTERS, 2024, 23 (02):
[46] Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection
Liu, Di
Zhang, Kao
Chen, Zhenzhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 967 - 981
[47] Mixed-scale cross-modal fusion network for referring image segmentation
Pan, Xiong
Xie, Xuemei
Yang, Jianxiu
NEUROCOMPUTING, 2025, 614
[48] CCAFusion: Cross-Modal Coordinate Attention Network for Infrared and Visible Image Fusion
Li, Xiaoling
Li, Yanfeng
Chen, Houjin
Peng, Yahui
Pan, Pan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 866 - 881
[49] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
Hao, Fei
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[50] Multi-hop Interactive Cross-Modal Retrieval
Ning, Xuecheng
Yang, Xiaoshan
Xu, Changsheng
MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 681 - 693

← 1 2 3 4 5 →