Text mining in the SOMLib Digital Library System: The representation of topics and genres

被引:9
|
作者
Rauber, A [1 ]
Merkl, D [1 ]
机构
[1] Vienna Univ Technol, Dept Software Technol, A-1040 Vienna, Austria
关键词
document clustering; Self-Organizing Map (SOM); genre analysis; metaphor graphics; digital libraries;
D O I
10.1023/A:1023297920966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb digital library system, built on neural networks to provide text mining capabilities. At its foundation we use the Self-Organizing Map to provide content-based clustering of documents. By using an extended model, i.e. the Growing Hierarchical Self-Organizing Map, we can further detect subject hierarchies in a document collection, with the neural network adapting its size and structure automatically during its unsupervised training process to reflect the topical hierarchy. By mining the weight vector structure of the trained maps our system is able to select keywords describing the various topical clusters. Text mining has to incorporate more than the mere analysis of content. Structural and genre information are key in organizing and locating information. Using color-coding techniques we can integrate a structural analysis of documents based on Self-Organizing Maps into the subject-based clustering relying on metaphor graphics for intuitive visualization. We demonstrate the capabilities of the SOMLib system using collections of articles from various newspapers and magazines.
引用
收藏
页码:271 / 293
页数:23
相关论文
共 50 条
  • [1] Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres
    Andreas Rauber
    Dieter Merkl
    Applied Intelligence, 2003, 18 : 271 - 293
  • [2] The SOMLib digital library system
    Rauber, A
    Merkl, D
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 1999, 1696 : 323 - 342
  • [3] Adding SOMLib capabilities to the Greenstone digital library system
    Mayer, Rudolf
    Rauber, Andreas
    DIGITAL LIBRARIES: ACHIEVEMENTS, CHALLENGES AND OPPORTUNITIES, PROCEEDINGS, 2006, 4312 : 486 - +
  • [4] Exploring Topics and Genres in Storytime Books: A Text Mining Approach
    Joo, Soohyung
    Ingram, Erin
    Cahill, Maria
    EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2021, 16 (04): : 41 - 62
  • [5] Text mining in a digital library
    Witten I.H.
    Don K.J.
    Dewsnip M.
    Tablan V.
    International Journal on Digital Libraries, 2004, 4 (1) : 56 - 59
  • [6] Topics Discovery in Text Mining
    Correia, Anacleto
    Goncalves, Antonio
    RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2017, 569 : 251 - 256
  • [7] Integrating Data and Text Mining Processes for Digital Library Applications
    Sanderson, Robert
    Watry, Paul
    PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, : 73 - +
  • [8] Analysis of content topics, user engagement and library factors in public library social media based on text mining
    Joo, Soohyung
    Lu, Kun
    Lee, Taehun
    ONLINE INFORMATION REVIEW, 2020, 44 (01) : 258 - 277
  • [9] text2arff: A Text Representation Library
    Can, Ender
    Amasyali, Mehmet Fatih
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 197 - 200
  • [10] Interacting with digital text: User perception of document genres
    Alberts, I
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2005, 29 (03): : 371 - 372