Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引:0
|
作者
Liu, Tengfei [1 ]
Hu, Yongli [1 ]
Gao, Junbin [2 ]
Sun, Yanfeng [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China
[2] Univ Sydney, Sydney, NSW, Australia
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;
D O I
10.1145/3631711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Hierarchical Multiple Granularity Attention Network for Long Document Classification
    Hu, Yongli
    Ding, Wen
    Liu, Tengfei
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [2] Cross-modal evidential fusion network for social media classification
    Yu, Chen
    Wang, Zhiguo
    COMPUTER SPEECH AND LANGUAGE, 2025, 92
  • [3] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
    Peng, Cheng
    Zhang, Chunxia
    Xue, Xiaojun
    Gao, Jiameng
    Liang, Hongjian
    Niu, Zhengdong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
  • [4] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
    Cheng Peng
    Chunxia Zhang
    Xiaojun Xue
    Jiameng Gao
    Hongjian Liang
    Zhengdong Niu
    TsinghuaScienceandTechnology, 2022, 27 (04) : 664 - 679
  • [5] Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval
    Liu, Yishu
    Wu, Qingpeng
    Zhang, Zheng
    Zhang, Jingyi
    Lu, Guangming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 893 - 902
  • [6] CROSS-MODAL DEEP NETWORKS FOR DOCUMENT IMAGE CLASSIFICATION
    Bakkali, Souhail
    Ming, Zuheng
    Coustaty, Mickael
    Rusinol, Marcal
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2556 - 2560
  • [7] MGPACNet: A Multiscale Geometric Prior Aware Cross-Modal Network for Images Fusion Classification
    Song, Xue
    Jiao, Licheng
    Li, Lingling
    Liu, Fang
    Liu, Xu
    Yang, Shuyuan
    Hou, Biao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [8] Supervised cross-modal factor analysis for multiple modal data classification
    Wang, Jingbin
    Zhou, Yihua
    Duan, Kanghong
    Wang, Jim Jing-Yan
    Bensmail, Halima
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1882 - 1888
  • [9] Heterogeneous Interactive Learning Network for Unsupervised Cross-Modal Retrieval
    Zheng, Yuanchao
    Zhang, Xiaowei
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 692 - 707
  • [10] A Short Video Classification Framework Based on Cross-Modal Fusion
    Pang, Nuo
    Guo, Songlin
    Yan, Ming
    Chan, Chien Aun
    SENSORS, 2023, 23 (20)