Extractive Text Summarization Models for Urdu Language

被引：21

作者：

Nawaz, Ali ^{[1
]}

Bakhtyar, Maheen ^{[1
]}

Baber, Junaid ^{[1
]}

Ullah, Ihsan ^{[1
]}

Noor, Waheed ^{[1
]}

Basit, Abdul ^{[1
]}

机构：

[1] Univ Balochistan Quetta, Quetta, Pakistan

来源：

INFORMATION PROCESSING & MANAGEMENT | 2020年 / 57卷 / 06期

关键词：

Natural Language Processing; Sentence Weight Algorithm; Text Summarization; Urdu Language; Weighted Term Frequency;

D O I：

10.1016/j.ipm.2020.102383

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the recent few years, a lot of advancement has been made in Urdu linguistics. There are many portals and news websites that are generating a huge amount of data every day. However, there is still no publicly available dataset nor any framework available for automatic Urdu extractive summary generation. In an automatic extractive summary generation, the sentences with the highest weights are given importance to be included in the summary. The sentence weight is computed by the sum of the weights of the words in the sentence. There are two famous approaches to compute the weight of the words in the English language: local weights (LW) approach and global weights (GW) approach. The sensitivity of the weights depends on the contents of the text, the one word may have different weights in a different article, known as LW based approach. Whereas, in the case of GW, the weights of the words are computed from the independent dataset, which implies the weights of all words remain the same in different articles. In the proposed framework, LW and GW based approaches are modeled for the Urdu language. The sentence weight method and the weighted term-frequency method are LW based approaches that compute the weights of the sentences by the sum of important words and the sum of frequencies of the important words, respectively. Whereas, vector space model (VSM) is GW based approach, that computes the weight of the words from the independent dataset, and then remain the same for all types of the text; GW is widely used in the English language for various applications such as information retrieval and text classification. The extractive summaries are generated by LW and GW based approaches and evaluated with ground-truth summaries that are obtained by the experts. The VSM is used as a baseline framework for sentence weighting. Experiments show that LW based approaches are better for extractive summary generation. The F-score of the sentence weight method and the weighted term-frequency method are 80% and 76%, respectively. The VSM achieved only 62% accuracy on the same dataset. Both, the datasets with ground-truth, and the code are made publicly available for the researchers.

引用

页数：14

共 50 条

[31] Extractive Text Summarization via Graph Entropy
Hark, Cengiz
Uckan, Taner
Seyyarer, Ebubekir
Karci, Ali
2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
[32] Genetic Clustering Algorithm for Extractive Text Summarization
Suarez Benjumea, Sebastian
Leon Guzman, Elizabeth
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 949 - 956
[33] Candidate sentence selection for extractive text summarization
Mutlu, Begum
Sezer, Ebru A.
Akcayol, M. Ali
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
[34] The CNN-Corpus in Spanish: a Large Corpus for Extractive Text Summarization in the Spanish Language
Lins, Rafael Dueire
Oliveira, Hilario
Cabral, Luciano
Batista, Jamilson
Tenorio, Bruno
Salcedo, Diego A.
Ferreira, Rafael
Lima, Rinaldo
Pereira e Silva, Gabriel de Franca
Simske, Steven J.
DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
[35] Evolutionary Algorithms for Extractive Automatic Text Summarization
Meena, Yogesh Kumar
Gopalani, Dinesh
INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 244 - 249
[36] A Survey of Extractive Arabic Text Summarization Approaches
Lagrini, Samira
Redjimi, Mohammed
Aziz, Nabiha
ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 159 - 171
[37] Extractive Text Summarization using Deep Learning
Shirwandkar, Nikhil S.
Kulkarni, Samidha
2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
[38] Extractive text summarization based on selectivity ranking
University of Rijeka, Department of Informatics, Rijeka, Croatia
不详
Int. Conf. INnov. Intell. Syst. Appl., INISTA - Proc., 2021,
[39] Enriched Semantic Graphs for Extractive Text Summarization
Sevilla, Antonio F. G.
Fernandez-Isabel, Alberto
Diaz, Alberto
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016, 2016, 9868 : 217 - 226
[40] Toward a Gold Standard for Extractive Text Summarization
Kennedy, Alistair
Szpakowicz, Stan
ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 63 - 74

← 1 2 3 4 5 →