Comparing keyword extraction techniques for WEBSOM text archives

被引:0
|
作者
Azcarraga, AP [1 ]
Yap, TN [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, PRIS Grp, Singapore 117543, Singapore
关键词
D O I
10.1109/ICTAI.2001.974464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The WEBSOM methodology for building very large text archives has a very, slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain tip to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method.
引用
收藏
页码:187 / 194
页数:8
相关论文
共 50 条
  • [41] SEMANTIC KEYWORD EXTRACTION VIA ADAPTIVE TEXT BINARIZATION OF UNSTRUCTURED UNSOURCED VIDEO
    Merler, Michele
    Kender, John R.
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 261 - 264
  • [42] Summary Extraction from Chinese Text for Data Archives of Online News
    Mikami, Nozomi
    Pichl, Lukas
    DATABASES IN NETWORKED INFORMATION SYSTEMS, 2011, 7108 : 190 - 202
  • [43] Keyword Extraction from Educational Video Transcripts Using NLP techniques
    Shukla, Himani
    Kakkar, Misha
    2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), 2016, : 105 - 108
  • [44] Information extraction techniques for multilevel text matching
    Di Tomaso, V
    D'Angelo, G
    ADVANCES IN INTELLIGENT SYSTEMS: CONCEPTS, TOOLS AND APPLICATIONS, 1999, 21 : 167 - 178
  • [45] A survey on different dimensions for graphical keyword extraction techniques Issues and Challenges
    Garg, Muskan
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (06) : 4731 - 4770
  • [46] Preprocessing Techniques for High Quality Text Extraction from Text Images
    Koshy, Alan
    Balakumar, Niranj M. J.
    Shyna, A.
    John, Ansamma
    PROCEEDINGS OF 2019 1ST INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION AND COMMUNICATION TECHNOLOGY (ICIICT 2019), 2019,
  • [47] Text Categorization Using Hyper Rectangular Keyword Extraction: Application to News Articles Classification
    Hassaine, Abdelaali
    Mecheter, Souad
    Jaoua, Ali
    RELATIONAL AND ALGEBRAIC METHODS IN COMPUTER SCIENCE (RAMICS 2015), 2015, 9348 : 312 - 325
  • [48] SIFRANK Algorithm for Chinese Text Keyword Extraction Based on Dependent Semantic Feature Constraints
    Zhang, Qian
    Wang, Tiancheng
    Zhu, Mengyuan
    Shen, Tao
    Zhao, Yilin
    Zhang, Yunwei
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1652 - 1657
  • [49] Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text
    Chen, Junjie
    Hou, Hongxu
    Gao, Jing
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [50] Language-independent extractive automatic text summarization based on automatic keyword extraction
    Hernandez-Castaneda, Angel
    Arnulfo Garcia-Hernandez, Rene
    Ledeneva, Yulia
    Eduardo Millan-Hernandez, Christian
    COMPUTER SPEECH AND LANGUAGE, 2022, 71