Katibeh: A Persian news summarizer using the novel semi-supervised approach

被引:1
|
作者
Farzi, Saeed [1 ]
Kianian, Sahar [2 ]
机构
[1] KN Toosi Univ Technol, Fac Comp Engn, Tehran, Iran
[2] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
D O I
10.1093/llc/fqy034
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Nowadays, text summarization is one of the most important active research fields in information retrieval. The most of the supervised extractive summarization systems utilize learning-to-rank methods to score sentences according to their importance. They need a high-quality comprehensive summarization corpus, which is labeled manually by human experts. Unfortunately, this sort of corpus is not available for most low-resource languages such as Persian. In this study, first of all, a comprehensive human-labeled summarization corpus (called Bistoon) collected by the crowdsourcing approach is introduced, and then a Persian summarizer based on a novel semi-supervised summarization approach, which is a combination of co-training and self-training, is presented to overcome the absence of sufficient data. During an iterative process, the proposed system is learned by Bistoon corpus and applied to unlabeled texts to generate the most confident summaries. These summaries are added to Bistoon for more iterations. During iterations, the training corpus is grown and the quality of the summarizer is simultaneously improved. The proposed system has been compared to other well-known Persian summarizers over the Pasokh and Bistoon standard test data sets. The evaluation results show the superiority of our methods in terms of precision, F-measure, Rouge metrics, and also human judgments.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [1] Semi-supervised approach for Persian word sense disambiguation
    Mahmoodvand, Mohamadreza
    Hourali, Maryam
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2017, : 104 - 110
  • [2] A new approach for semi-supervised online news classification
    Ko, HM
    Lam, W
    WEB AND COMMUNICATION TECHNOLOGIES AND INTERNET -RELATED SOCIAL ISSUES - HSI 2005, 2005, 3597 : 238 - 247
  • [3] NEWSMAP A semi-supervised approach to geographical news classification
    Watanabe, Kohei
    DIGITAL JOURNALISM, 2018, 6 (03) : 294 - 309
  • [4] Semi-supervised Persian font recognition
    Imani, Maryam Bahojb
    Keyvanpour, Mohamad Reza
    Azmi, Reza
    WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [5] A novel semi-supervised approach for feature extraction
    Qiu, Junyang
    Zhang, Yanyan
    Pan, Zhisong
    Yang, Haimin
    Ren, Huifeng
    Li, Xin
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3765 - 3770
  • [6] Semi-supervised News Genre Classification
    Slivka, Jelena
    Kovacevic, Aleksandar
    IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2013, 9 (01): : 32 - 37
  • [7] News Article Classification with Clustering using Semi-Supervised Learning
    Krishnamoorthy, Arjun
    Patil, Akshay Kishor
    Vasudevan, N.
    Pathari, Vinod
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 86 - 91
  • [8] A Novel Semi-Supervised Learning Approach to Pedestrian Reidentification
    Han, Hua
    Ma, Wenjin
    Zhou, MengChu
    Guo, Qiang
    Abusorrah, Abdullah
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (04) : 3042 - 3052
  • [9] SETNet: A Novel Semi-Supervised Approach for Semantic Parsing
    Wang, Xiaolu
    Sun, Haifeng
    Qi, Qi
    Wang, Jingyu
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2236 - 2243
  • [10] A Novel Semi-supervised Approach for Protein Sequence Classification
    Chaturvedi, Bharti
    Patil, Nagamma
    2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 1158 - 1162