Comparing keyword extraction techniques for WEBSOM text archives

被引:0
|
作者
Azcarraga, AP [1 ]
Yap, TN [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, PRIS Grp, Singapore 117543, Singapore
关键词
D O I
10.1109/ICTAI.2001.974464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The WEBSOM methodology for building very large text archives has a very, slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain tip to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method.
引用
收藏
页码:187 / 194
页数:8
相关论文
共 50 条
  • [21] The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
    Najafi, Elham
    Darooneh, Amir H.
    PLOS ONE, 2015, 10 (06):
  • [22] An Unsupervised Keyword Extraction Method based on Text Semantic Graph
    Zhao, Liujun
    Miao, Zhongquan
    Wang, Chunming
    Kong, Weizheng
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1431 - 1436
  • [23] Using citation networks to evaluate the impact of text length on keyword extraction
    Tohalino, Jorge A. V.
    Silva, Thiago C.
    Amancio, Diego R.
    PLOS ONE, 2023, 18 (11):
  • [24] Iterative Hard Thresholding for Keyword Extraction from Large Text Corpora
    Yadlowsky, Steve
    Nakkarin, Preetum
    Wang, Jingyan
    Sharma, Rishi
    El Ghaoui, Laurent
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 588 - 593
  • [25] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [26] A Text Feature Based Automatic Keyword Extraction Method for Single Documents
    Campos, Ricardo
    Mangaravite, Vitor
    Pasquali, Arian
    Jorge, Alipio Mario
    Nunes, Celia
    Jatowt, Adam
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 684 - 691
  • [27] Text Summarization with Automatic Keyword Extraction in Telugu e-Newspapers
    Naidu, Reddy
    Bharti, Santosh Kumar
    Babu, Korra Sathya
    Mohapatra, Ramesh Kumar
    SMART COMPUTING AND INFORMATICS, 2018, 77 : 555 - 564
  • [28] A Feature Extraction Method Using Base Phrase and keyword In Chinese Text
    Li, Xin-fu
    Zhao, Lei-lei
    Wu, Li-hong
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 680 - +
  • [29] Incorporating keyword extraction and attention for multi-label text classification
    Zhao, Hua
    Li, Xiaoqian
    Wang, Fengling
    Zeng, Qingtian
    Diao, Xiuli
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (02) : 2083 - 2093
  • [30] EFFICIENT KEYWORD EXTRACTION AND TEXT SUMMARIZATION FOR READING ARTICLES ON SMART PHONE
    Jeong, Hyoungil
    Ko, Youngjoong
    Seo, Jungyun
    COMPUTING AND INFORMATICS, 2015, 34 (04) : 779 - 794