Comparing keyword extraction techniques for WEBSOM text archives

被引:0
|
作者
Azcarraga, AP [1 ]
Yap, TN [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, PRIS Grp, Singapore 117543, Singapore
关键词
D O I
10.1109/ICTAI.2001.974464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The WEBSOM methodology for building very large text archives has a very, slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain tip to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method.
引用
收藏
页码:187 / 194
页数:8
相关论文
共 50 条
  • [1] Evaluating keyword selection methods for WEBSOM text archives
    Azcarraga, AP
    Yap, TN
    Tan, J
    Chua, TS
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (03) : 380 - 383
  • [2] Keyword extraction for text categorization
    An, JY
    Chen, YPP
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ACTIVE MEDIA TECHNOLOGY (AMT 2005), 2005, : 556 - 561
  • [3] Text Keyword Extraction Based on GPT
    He, Pinyao
    Huang, Jingyue
    Li, Ming
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1394 - 1398
  • [4] Automatic Keyword Extraction From Dialogue Text
    Sali, Yusuf
    Erden, Mustafa
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [5] Keyword extraction for social media short text
    Zhao, Dexin
    Du, Nana
    Chang, Zhi
    Li, Yukun
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 251 - 256
  • [6] Performance Evaluation of Keyword Extraction Techniques and Stop Word Lists on Speech-To- Text Corpus
    Guda, Blessed
    Agajo, James
    Nuhu, Bello Kontagora
    Aliyu, Ibrahim
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (01) : 134 - 140
  • [7] An Improved Focused Crawler Based on Text Keyword Extraction
    Zheng, Zhang
    Qian, Du
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 386 - 390
  • [8] Chinese Automatic Text Summarization Based on Keyword Extraction
    Jiang Xiao-yu
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 225 - 228
  • [9] Text Reuse Detection by Keyword Extraction for Telegram Channels
    Saki, Misam
    Faili, Heshaam
    Asadpour, Masoud
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1481 - 1484
  • [10] Analysis of Text Collections for the Purposes of Keyword Extraction Task
    Vanyushkin, Alexander
    Graschenko, Leonid
    JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2020, 44 (01) : 171 - 184