Text mining in the SOMLib Digital Library System: The representation of topics and genres

被引:9
|
作者
Rauber, A [1 ]
Merkl, D [1 ]
机构
[1] Vienna Univ Technol, Dept Software Technol, A-1040 Vienna, Austria
关键词
document clustering; Self-Organizing Map (SOM); genre analysis; metaphor graphics; digital libraries;
D O I
10.1023/A:1023297920966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb digital library system, built on neural networks to provide text mining capabilities. At its foundation we use the Self-Organizing Map to provide content-based clustering of documents. By using an extended model, i.e. the Growing Hierarchical Self-Organizing Map, we can further detect subject hierarchies in a document collection, with the neural network adapting its size and structure automatically during its unsupervised training process to reflect the topical hierarchy. By mining the weight vector structure of the trained maps our system is able to select keywords describing the various topical clusters. Text mining has to incorporate more than the mere analysis of content. Structural and genre information are key in organizing and locating information. Using color-coding techniques we can integrate a structural analysis of documents based on Self-Organizing Maps into the subject-based clustering relying on metaphor graphics for intuitive visualization. We demonstrate the capabilities of the SOMLib system using collections of articles from various newspapers and magazines.
引用
收藏
页码:271 / 293
页数:23
相关论文
共 50 条
  • [41] Text Mining: Finding Hot Topics TF*PDF vs. LSI
    Katyayani, J.
    Sriharsha, A. V.
    Sudhir, B.
    2009 IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2009, : 526 - +
  • [42] Text mining for identifying topics in the literatures about adolescent substance use and depression
    Shi-Heng Wang
    Yijun Ding
    Weizhong Zhao
    Yung-Hsiang Huang
    Roger Perkins
    Wen Zou
    James J. Chen
    BMC Public Health, 16
  • [43] Text mining for identifying topics in the literatures about adolescent substance use and depression
    Wang, Shi-Heng
    Ding, Yijun
    Zhao, Weizhong
    Huang, Yung-Hsiang
    Perkins, Roger
    Zou, Wen
    Chen, James J.
    BMC PUBLIC HEALTH, 2016, 16
  • [44] A text mining and network analysis of topics and trends in major nursing research journals
    Oner, Beratiye
    Hakli, Orhan
    Zengul, Ferhat D.
    NURSING OPEN, 2024, 11 (01):
  • [45] Research on Aided Reading System of Digital Library Based on Text Image Features and Edge Computing
    Shi, Yuqing
    Zhu, Yuelong
    IEEE ACCESS, 2020, 8 : 205980 - 205988
  • [46] Mining Semantic Representation From Medical Text: A Bayesian Approach
    Prakash, Bino Patric G.
    Jacob, Shomona Gracia
    Radhameena, S.
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [47] Bottom Up Text Mining through Hierarchical Document Representation
    Djouadi, Y.
    Souam, F.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 11, 2006, 11 : 49 - 54
  • [48] Text Mining approaches for Automated Literature Knowledge Extraction and Representation
    Nuzzo, Angelo
    Mulas, Francesca
    Gabetta, Matteo
    Arbustini, Eloisa
    Zupan, Blaz
    Larizza, Cristiana
    Bellazzi, Riccardo
    MEDINFO 2010, PTS I AND II, 2010, 160 : 954 - 958
  • [49] A Representation Method for Cellular Lines based on SVM and Text Mining
    Carrera, Ivan
    Dutra, Ines
    Tejera, Eduardo
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2717 - 2723
  • [50] Text Mining Methods for Social Representation Analysis in Large Corpora
    Chartier, Jean-Francois
    Meunier, Jean-Guy
    PAPERS ON SOCIAL REPRESENTATIONS, 2011, 20 (02):