Research on Cross Language Text Keyword Extraction Based on Information Entropy and TextRank

被引:0
|
作者
Zhang, Xiaoyu [1 ]
Wang, Yongbin [1 ]
Wu, Lin [1 ]
机构
[1] Commun Univ China, Internet Informat Res Inst, Beijing 100024, Peoples R China
关键词
component; information entropy; TextRank; keyword extraction; Cross language keyword extraction;
D O I
10.1109/itnec.2019.8728993
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to extract keywords from cross-language documents as accurately as possible, especially for the language whose keyword extraction technology is not mature, a text keyword extraction method based on information entropy and TextRank is proposed to extract the accurate keywords from the translated Chinese documents. This method determines the basic importance of words according to the information entropy of words, and then uses the information entropy of words to vote iteratively through the TextRank algorithm. This method solves the problem that TextRank algorithm easily extracts frequent non key words as keywords. The experimental results show that the proposed method can extract keywords more accurately than TextRank in the processing of cross-lingual bilingual translated documents.
引用
收藏
页码:16 / 19
页数:4
相关论文
共 50 条
  • [41] Negation and uncertainty information extraction oriented to natural language text
    Zou B.-W.
    Qian Z.
    Chen Z.-C.
    Zhu Q.-M.
    Zhou G.-D.
    Ruan Jian Xue Bao/Journal of Software, 2016, 27 (02): : 309 - 328
  • [42] The Research on the Value of Information Based on Information Entropy
    Xie Xiang
    Zang Xueyun
    Guan Zhongliang
    RECENT ADVANCE IN STATISTICS APPLICATION AND RELATED AREAS, PTS 1 AND 2, 2008, : 218 - 226
  • [43] Speech and Text Query based Tamil - English Cross Language Information Retrieval System
    Iswarya, P.
    Radha, V.
    2014 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2014,
  • [44] Research on Cross-language Text Similarity Calculation
    Yuan, Sun
    Qian, Zhao
    PROCEEDINGS OF 2015 IEEE 5TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION, 2015, : 423 - 426
  • [45] Uyghur-Kazakh-Kirghiz Text Keyword Extraction Based on Morpheme Segmentation
    Parhat, Sardar
    Sattar, Mutallip
    Hamdulla, Askar
    Kadir, Abdurahman
    INFORMATION, 2023, 14 (05)
  • [46] Variance-based features for keyword extraction in Persian and English text documents
    Veisi, H.
    Aflaki, N.
    Parsafard, P.
    SCIENTIA IRANICA, 2020, 27 (03) : 1301 - 1315
  • [47] Variance-based features for keyword extraction in Persian and English text documents
    Veisi H.
    Aflaki N.
    Parsafard P.
    Scientia Iranica, 2020, 27 (3 D) : 1301 - 1315
  • [48] Research on Web Cross Language Information Retrieval Based on Domain Ontology
    Cheng, Xiaorong
    Guo, Haojun
    Wang, Yuhui
    He, Wei
    2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 622 - 626
  • [49] NATURAL-LANGUAGE INTERFACE BASED ON KEYWORD EXTRACTION USING AWK
    PRASAD, KVKK
    LAMBA, TS
    MICROPROCESSORS AND MICROSYSTEMS, 1987, 11 (03) : 157 - 160
  • [50] Text Information Extraction Based on OWL Ontologies
    Wang, Hongsheng
    Yuan, Lu
    Shao, Hong
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 217 - 222