Katibeh: A Persian news summarizer using the novel semi-supervised approach

被引:1
|
作者
Farzi, Saeed [1 ]
Kianian, Sahar [2 ]
机构
[1] KN Toosi Univ Technol, Fac Comp Engn, Tehran, Iran
[2] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
D O I
10.1093/llc/fqy034
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Nowadays, text summarization is one of the most important active research fields in information retrieval. The most of the supervised extractive summarization systems utilize learning-to-rank methods to score sentences according to their importance. They need a high-quality comprehensive summarization corpus, which is labeled manually by human experts. Unfortunately, this sort of corpus is not available for most low-resource languages such as Persian. In this study, first of all, a comprehensive human-labeled summarization corpus (called Bistoon) collected by the crowdsourcing approach is introduced, and then a Persian summarizer based on a novel semi-supervised summarization approach, which is a combination of co-training and self-training, is presented to overcome the absence of sufficient data. During an iterative process, the proposed system is learned by Bistoon corpus and applied to unlabeled texts to generate the most confident summaries. These summaries are added to Bistoon for more iterations. During iterations, the training corpus is grown and the quality of the summarizer is simultaneously improved. The proposed system has been compared to other well-known Persian summarizers over the Pasokh and Bistoon standard test data sets. The evaluation results show the superiority of our methods in terms of precision, F-measure, Rouge metrics, and also human judgments.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [31] A Modeling Approach Using Multiple Graphs for Semi-Supervised Learning
    Izutani, Akihiko
    Uehara, Kuniaki
    DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 296 - 307
  • [32] TESC: An approach to TExt classification using Semi-supervised Clustering
    Zhang, Wen
    Tang, Xijin
    Yoshida, Taketoshi
    KNOWLEDGE-BASED SYSTEMS, 2015, 75 : 152 - 160
  • [33] A NEW APPROACH FOR MOTHERESE DETECTION USING A SEMI-SUPERVISED ALGORITHM
    Mahdhaoui, Ammar
    Chetouani, Mohamed
    2009 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2009, : 417 - +
  • [34] Semi-Supervised Approach to Predictive Analysis Using Temporal Data
    Shenk, Kimberly
    Bertsimas, Dimitris
    Markuzon, Natasha
    MILITARY OPERATIONS RESEARCH, 2014, 19 (01) : 37 - 50
  • [35] A Semi-Supervised Approach to GRN Inference Using Learning and Optimization
    Daoudi, Meroua
    Meshoul, Souham
    Boucherkha, Samia
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2021, 12 (04) : 155 - 176
  • [36] Text Classification using Semi-supervised Approach for Multi Domain
    Deshmukh, Jyoti S.
    Tripathy, Amiya Kumar
    2017 INTERNATIONAL CONFERENCE ON NASCENT TECHNOLOGIES IN ENGINEERING (ICNTE-2017), 2017,
  • [37] Dental caries detection using a semi-supervised learning approach
    Adnan Qayyum
    Ahsen Tahir
    Muhammad Atif Butt
    Alexander Luke
    Hasan Tahir Abbas
    Junaid Qadir
    Kamran Arshad
    Khaled Assaleh
    Muhammad Ali Imran
    Qammer H. Abbasi
    Scientific Reports, 13
  • [38] A Novel Distributed Semi-Supervised Approach for Detection of Network Based Attacks
    Jain, Meenal
    Kaur, Gagandeep
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 120 - 125
  • [39] Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach
    Marmol-Sanchez, Emilio
    Cirera, Susanna
    Quintanilla, Raquel
    Pla, Albert
    Amills, Marcel
    GENOMICS, 2020, 112 (03) : 2107 - 2118
  • [40] A semi-supervised model for Persian rumor verification based on content information
    Zoleikha Jahanbakhsh-Nagadeh
    Mohammad-Reza Feizi-Derakhshi
    Arash Sharifi
    Multimedia Tools and Applications, 2021, 80 : 35267 - 35295