Comparing keyword extraction techniques for WEBSOM text archives

被引:0
|
作者
Azcarraga, AP [1 ]
Yap, TN [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, PRIS Grp, Singapore 117543, Singapore
关键词
D O I
10.1109/ICTAI.2001.974464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The WEBSOM methodology for building very large text archives has a very, slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain tip to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method.
引用
收藏
页码:187 / 194
页数:8
相关论文
共 50 条
  • [31] Keyword Combination Extraction in Text Categorization Based on Ant Colony Optimization
    Yu, Zi-jun
    Wu, Wei-gang
    Xiao, Jing
    Zhang, Jun
    Huang, Rui-Zhang
    Liu, Ou
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 430 - +
  • [32] Performance Analysis of Keyword Extraction Algorithms Assessing Extractive Text Summarization
    Kumar, Akshi
    Sharma, Aditi
    Sharma, Sidhant
    Kashyap, Shashwat
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS AND ELECTRONICS (COMPTELIX), 2017, : 408 - 414
  • [33] An Empirical Study of Important Keyword Extraction Techniques from Documents
    Hasan, H. M. Mahedi
    Sanyal, Falguni
    Chaki, Dipankar
    Ali, Md. Haider
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 91 - 94
  • [34] Study of Keyword Extraction Techniques for Electric Double-Layer Capacitor Domain Using Text Similarity Indexes: An Experimental Analysis
    Miah, M. Saef Ullah
    Sulaiman, Junaida
    Bin Sarwar, Talha
    Zamli, Kamal Z.
    Jose, Rajan
    COMPLEXITY, 2021, 2021
  • [35] Automatic Keyword Extraction from Bengali Text using Improved RAKE Approach
    Haque, Mozammel
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [36] Uyghur-Kazakh-Kirghiz Text Keyword Extraction Based on Morpheme Segmentation
    Parhat, Sardar
    Sattar, Mutallip
    Hamdulla, Askar
    Kadir, Abdurahman
    INFORMATION, 2023, 14 (05)
  • [37] Variance-based features for keyword extraction in Persian and English text documents
    Veisi, H.
    Aflaki, N.
    Parsafard, P.
    SCIENTIA IRANICA, 2020, 27 (03) : 1301 - 1315
  • [38] Research on Cross Language Text Keyword Extraction Based on Information Entropy and TextRank
    Zhang, Xiaoyu
    Wang, Yongbin
    Wu, Lin
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 16 - 19
  • [39] Variance-based features for keyword extraction in Persian and English text documents
    Veisi H.
    Aflaki N.
    Parsafard P.
    Scientia Iranica, 2020, 27 (3 D) : 1301 - 1315
  • [40] Chinese Text Keyword Extraction Based on Doc2vec And TextRank
    Wang, Wei
    Li, Xiangshun
    Yu, Sheng
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 369 - 373