Text mining in the SOMLib Digital Library System: The representation of topics and genres

被引:9
|
作者
Rauber, A [1 ]
Merkl, D [1 ]
机构
[1] Vienna Univ Technol, Dept Software Technol, A-1040 Vienna, Austria
关键词
document clustering; Self-Organizing Map (SOM); genre analysis; metaphor graphics; digital libraries;
D O I
10.1023/A:1023297920966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb digital library system, built on neural networks to provide text mining capabilities. At its foundation we use the Self-Organizing Map to provide content-based clustering of documents. By using an extended model, i.e. the Growing Hierarchical Self-Organizing Map, we can further detect subject hierarchies in a document collection, with the neural network adapting its size and structure automatically during its unsupervised training process to reflect the topical hierarchy. By mining the weight vector structure of the trained maps our system is able to select keywords describing the various topical clusters. Text mining has to incorporate more than the mere analysis of content. Structural and genre information are key in organizing and locating information. Using color-coding techniques we can integrate a structural analysis of documents based on Self-Organizing Maps into the subject-based clustering relying on metaphor graphics for intuitive visualization. We demonstrate the capabilities of the SOMLib system using collections of articles from various newspapers and magazines.
引用
收藏
页码:271 / 293
页数:23
相关论文
共 50 条
  • [31] Research topics and trends of endangered species using text mining in Korea
    Do, Min Seock
    Choi, Green
    Hwang, Jae-Woong
    Lee, Ji-Yeon
    Hur, Wee-Haeng
    Choi, Yu-Seong
    Son, Seok-Jun
    Kwon, In-Ki
    Yoo, Sung-Yeon
    Nam, Hyung-Kyu
    JOURNAL OF ASIA-PACIFIC BIODIVERSITY, 2020, 13 (04) : 518 - 523
  • [32] Application of Data Mining Technology in Digital Library
    Zhang, Mei
    JOURNAL OF COMPUTERS, 2011, 6 (04) : 761 - 768
  • [33] An Ontological Representation of the Digital Library Evaluation Domain
    Tsakonas, Giannis
    Papatheodorou, Christos
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (08): : 1577 - 1593
  • [34] Personalization of Search Results Representation of a Digital Library
    Paskali, Ljubomir
    Ivanovic, Lidija
    Kapitsaki, Georgia
    Ivanovic, Dragan
    Surla, Bojana Dimic
    Surla, Dusan
    INFORMATION TECHNOLOGY AND LIBRARIES, 2021, 40 (01)
  • [35] Study on applications of Web Mining to digital library
    Wang, BJ
    Xu, RQ
    Zhu, JN
    Luo, QS
    Cheng, GM
    Yang, WL
    Xin, ZH
    Wang, LY
    Liu, QS
    Artificial Intelligence Applications and Innovations II, 2005, 187 : 777 - 787
  • [36] Data Mining: Competitive Tool to Digital Library
    Lone, Tariq Ahmad
    Khan, Rafi Ahmad
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2014, 34 (05): : 401 - 406
  • [37] Text2MARK: A text mining tool in the aid of knowledge representation
    da Silva, Clay Palmeira
    de Morais, Jefferson Magalhaes
    Monteiro, Dionne Cavaleante
    2013 13TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2013, : 236 - 241
  • [38] Towards a full-text historical digital library
    Allen, Robert B.
    Chu, Yoonmi
    Allen, Robert B., 1600, Springer Verlag (8839): : 218 - 226
  • [39] Towards a Full-Text Historical Digital Library
    Allen, Robert B.
    Chu, Yoonmi
    EMERGENCE OF DIGITAL LIBRARIES - RESEARCH AND PRACTICES, 2014, 8839 : 218 - 226
  • [40] Cats' and dogs' welfare: text mining and topics modeling analysis of the scientific literature
    Adamakopoulou, Chrysa
    Benedetti, Beatrice
    Zappaterra, Martina
    Felici, Martina
    Masebo, Naod Thomas
    Previti, Annalisa
    Passantino, Annamaria
    Padalino, Barbara
    FRONTIERS IN VETERINARY SCIENCE, 2023, 10