Extractive summarization using complex networks and syntactic dependency

被引:44
作者
Amancio, Diego R. [1 ]
Nunes, Maria G. V. [2 ]
Oliveira, Osvaldo N., Jr. [1 ]
Costa, Luciano da F. [1 ]
机构
[1] Univ Sao Paulo, Inst Fis Sao Carlos, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Inst Ciencias Matemat & Comp, BR-13560970 Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Summarization; Complex networks; Diversity metrics; Entropy; Syntactical dependency; COMMUNITY STRUCTURE; CENTRALITY; WORLD;
D O I
10.1016/j.physa.2011.10.015
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1855 / 1864
页数:10
相关论文
共 42 条
[1]   Using metrics from complex networks to evaluate machine translation [J].
Amancio, D. R. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Pardo, T. A. S. ;
Antiqueira, L. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (01) :131-142
[2]   Complex networks analysis of manual and machine translations [J].
Amancio, Diego R. ;
Antiqueira, Lucas ;
Pardo, Thiago A. S. ;
Costa, Luciano da F. ;
Oliveira, Osvaldo N., Jr. ;
Nunes, Maria G. V. .
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2008, 19 (04) :583-598
[3]  
AMiller G., 1985, P 1 INT C INF DAT U
[4]  
[Anonymous], 2003, Linked: How everything is connected to everything else and what it means
[5]  
[Anonymous], 2001, PHYS REV E
[6]   Strong correlations between text quality and complex networks features [J].
Antiqueira, L. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2007, 373 :811-820
[7]  
Antiqueira L., 2007, INTELIGENCIA ARTIFIC, V11, P51, DOI DOI 10.4114/IA.V11I36.891
[8]  
Antiqueira L., 2005, AN 3 WORKSH TECN INF, P1
[9]   A complex network approach to text summarization [J].
Antiqueira, Lucas ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da Fontoura ;
Volpe Nunes, Maria das Gracas .
INFORMATION SCIENCES, 2009, 179 (05) :584-599
[10]   Scale-free characteristics of random networks:: the topology of the World-Wide Web [J].
Barabási, AL ;
Albert, R ;
Jeong, H .
PHYSICA A, 2000, 281 (1-4) :69-77