Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引：0

作者：

Liu, Tengfei ^{[1
]}

Hu, Yongli ^{[1
]}

Gao, Junbin ^{[2
]}

Sun, Yanfeng ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China

[2] Univ Sydney, Sydney, NSW, Australia

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2024年 / 18卷 / 04期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;

D O I：

10.1145/3631711

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.

引用

页数：24

共 50 条

[21] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
Shu, Xinsheng
Li, Mingyong
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
[22] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
Peng, Yuxin
Qi, Jinwei
Huang, Xin
Yuan, Yuxin
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
[23] PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer
Yu, Wei
Li, Yanping
Wang, Rui
Cao, Wenming
Xiang, Wei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3369 - 3382
[24] CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition
Zheng, Jinzhi
Ji, Ruyi
Zhang, Libo
Wu, Yanjun
Zhao, Chen
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 421 - 433
[25] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
Song, Xin
Chen, Zhikui
Zhong, Fangming
Gao, Jing
Zhang, Jianning
Li, Peng
SENSORS, 2024, 24 (22)
[26] Object Classification in SAR Imagery With Deep Cross-Modal Transfer Network
Li, Xue
Wu, Yuan
Lai, Zuomei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[27] Cross-modal fusion for multi-label image classification with attention mechanism
Wang, Yangtao
Xie, Yanzhao
Zeng, Jiangfeng
Wang, Hanpin
Fan, Lisheng
Song, Yufan
Computers and Electrical Engineering, 2022, 101
[28] Cross-modal fusion for multi-label image classification with attention mechanism
Wang, Yangtao
Xie, Yanzhao
Zeng, Jiangfeng
Wang, Hanpin
Fan, Lisheng
Song, Yufan
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
[29] A Bidirectional Separated Distillation-Based Cross-Modal Interactive Fusion Network for Skeleton-Based Action Recognition
Wang, Mingdao
Zhang, Xianlin
Chen, Siqi
Li, Xueming
Zhang, Yue
IEEE SENSORS JOURNAL, 2025, 25 (01) : 1814 - 1824
[30] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification (vol 27, pg 664, 2022)
Niu, Zhendong
TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (05) : 880 - 880

← 1 2 3 4 5 →