Research on multi-document summarization merging the sentential semantic features

被引:0
|
作者
Luo S.-L. [1 ]
Bai J.-M. [1 ]
Pan L.-M. [1 ]
Han L. [1 ]
Meng Q. [1 ]
机构
[1] School of Information and Electronics, Beijing Institute of Technology, Beijing
来源
| 2016年 / Beijing Institute of Technology卷 / 36期
关键词
Multi-document summarization; Natural language processing; Sentential semantic feature; Sentential semantic model;
D O I
10.15918/j.tbit1001-0645.2016.10.014
中图分类号
学科分类号
摘要
Multi-document summarization (MDS) is one of the key issues in the field of natural language processing. In order to extract compendious sentences to reflect more accurate theme of the multi-document, a new method was proposed to retrieve terse sentences. At first, some sentential semantic features (SSF), for example topic and predicate, were extracted based on a sentential semantic model (SSM). Then the sentence weight was calculated by building feature vector merging statistical features and SSF. Finally, sentences were extracted according to the feature weighting and maximal marginal relevance (MMR). A set of experiment show that the new method is effective, the average precision rate of summary can reach 66.7%, and the average recall rate can reach 65.5% when the compression ratio of summary is 15%. The results of experiments show that the SSF are effective on upgrading the affection of MDS. © 2016, Editorial Department of Transaction of Beijing Institute of Technology. All right reserved.
引用
收藏
页码:1059 / 1064
页数:5
相关论文
共 16 条
  • [1] Wang D., Li T., Weighted consensus multi-document summarization, Information Processing & Management, 48, 3, pp. 513-523, (2012)
  • [2] Radev D.R., Jing H., Stys M., Et al., Centroid-based summarization of multiple documents, Information Processing & Management, 40, 6, pp. 919-938, (2004)
  • [3] Erkan G., Radev D.R., Lexpagerank: prestige in multi-document text summarization, Proceedings of EMNLP, pp. 365-371, (2004)
  • [4] Arora R., Ravindran B., Latent dirichlet allocation based multi-document summarization, Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, pp. 91-97, (2008)
  • [5] Xu Y., Xu Z., Wang X., Multi-document automatic summarization technique based on information fusion, Chinese Journal of Computers, 30, 11, pp. 2048-2054, (2007)
  • [6] Yang X., Ma J., Yang T., Et al., Automatic multi-document summarization based on the latent Dirichlet topic allocation model, Caai Transactions on Intelligent Systems, 5, 2, pp. 169-176, (2010)
  • [7] Carbonell J., Goldstein J., The use of MMR, diversity-based reranking for reordering documents and producing summaries, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335-336, (1998)
  • [8] Feng Y., Research on Chinese sentential semantic mode and some key problems, (2010)
  • [9] Luo S., Liu Y., Feng Y., Et al., Method of building BFS-CTC a Chinese tagged corpus of sentential semantic structure, Journal of Beijing Institute of Technology, 32, 3, pp. 311-315, (2012)
  • [10] Su K., Chinese text keyword extraction and automatic summarization technology, (2008)