ZoomNet for Topic-Oriented Fragment Recognition in Long Documents

被引:0
|
作者
Yan, Yukun [1 ,2 ]
Zheng, Daqi [3 ]
Lu, Zhengdong [3 ]
Song, Sen [1 ,2 ]
机构
[1] Tsinghua Univ, Lab Brain & Intelligence, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Biomed Engn, Beijing 100084, Peoples R China
[3] Deeplycurious AI, Res Dept, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Labeling; Decoding; Context modeling; Encoding; Information retrieval; Computational modeling; Information extraction; neural network; long documents; reinforcement learning; TERM DEPENDENCIES;
D O I
10.1109/ACCESS.2022.3166235
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work introduces a new information extraction task called Topic-Oriented Fragment Recognition (TOFR), whose goal is to recognize information related to a specific topic in long documents from professional fields. In this paper, we introduce two TOFR datasets to study the problems of processing long documents. We propose a novel neural framework named Zooming Network (ZoomNet), which overcomes the challenge of combining information over long distances with limited computing resources by flexibly switching between skimming and intensive reading in processing long documents. In general, ZoomNet first establishes a hierarchical representation aligned to the text structure, which relieves the conflict between local information and extensive contextual information. Then, it synthesizes different levels of information to assign tags via multi-scale actions. We combine supervised and reinforcement learning methods to train our model. Experiments show that the proposed model outperforms several state-of-the-art sequence labeling models, including BiLSTM-CRF, BERT, XLNET, RoBERTa, and ELECTRA, on both TOFR datasets with big margins.
引用
收藏
页码:39545 / 39554
页数:10
相关论文
共 50 条
  • [1] Topic-Oriented Dialogue Summarization
    Lin, Haitao
    Zhu, Junnan
    Xiang, Lu
    Zhai, Feifei
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1797 - 1810
  • [2] Topic-oriented mining and reasoning
    Li, YF
    Zhong, N
    Yao, YY
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ACTIVE MEDIA TECHNOLOGY (AMT 2005), 2005, : 321 - 326
  • [3] Topic-oriented measurement of microblogging network
    Liu, Wei
    Wang, Li-Hong
    Li, Rui-Guang
    Tongxin Xuebao/Journal on Communications, 2013, 34 (11): : 171 - 178
  • [4] SPARSE MODELING FOR TOPIC-ORIENTED VIDEO SUMMARIZATION
    Panda, Rameswar
    Roy-Chowdhury, Amit K.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1388 - 1392
  • [5] A topic-oriented clustering approach for domain services
    Wang, J. (jianwang@whu.edu.cn), 1600, Science Press (51):
  • [6] Discovering Topic-Oriented Highly Interactive Online Communities
    Das, Swarna
    Anwar, Md Musfique
    FRONTIERS IN BIG DATA, 2019, 2
  • [7] A distributed, graphical, topic-oriented document search system
    Light, J
    VISUAL DATA EXPLORATION AND ANALYSIS IV, 1997, 3017 : 129 - 135
  • [8] Topic-Oriented Controlled Text Generation for Social Networks
    Zhian Yang
    Hao Jiang
    Aobo Deng
    Yang Li
    Journal of Signal Processing Systems, 2024, 96 : 131 - 151
  • [9] Topic-Oriented Exploratory Search Based on an Indexing Network
    Sun, HaiChun
    Jiang, ChangJun
    Ding, ZhiJun
    Wang, PengWei
    Zhou, MengChu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (02): : 234 - 247
  • [10] Learning Topic-Oriented Word Embedding for Query Classification
    Yang, Hebin
    Hu, Qinmin
    He, Liang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 188 - 198