Text mining in the SOMLib Digital Library System: The representation of topics and genres

被引:9
|
作者
Rauber, A [1 ]
Merkl, D [1 ]
机构
[1] Vienna Univ Technol, Dept Software Technol, A-1040 Vienna, Austria
关键词
document clustering; Self-Organizing Map (SOM); genre analysis; metaphor graphics; digital libraries;
D O I
10.1023/A:1023297920966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb digital library system, built on neural networks to provide text mining capabilities. At its foundation we use the Self-Organizing Map to provide content-based clustering of documents. By using an extended model, i.e. the Growing Hierarchical Self-Organizing Map, we can further detect subject hierarchies in a document collection, with the neural network adapting its size and structure automatically during its unsupervised training process to reflect the topical hierarchy. By mining the weight vector structure of the trained maps our system is able to select keywords describing the various topical clusters. Text mining has to incorporate more than the mere analysis of content. Structural and genre information are key in organizing and locating information. Using color-coding techniques we can integrate a structural analysis of documents based on Self-Organizing Maps into the subject-based clustering relying on metaphor graphics for intuitive visualization. We demonstrate the capabilities of the SOMLib system using collections of articles from various newspapers and magazines.
引用
收藏
页码:271 / 293
页数:23
相关论文
共 50 条
  • [21] Bringing Structure to Text: Mining Phrases, Entities, Topics, and Hierarchies
    Han, Jiawei
    Wang, Chi
    El-Kishky, Ahmed
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1968 - 1968
  • [22] Alkemio: association of chemicals with biomedical topics by text and data mining
    Gijon-Correas, Jose A.
    Andrade-Navarro, Miguel A.
    Fontaine, Jean F.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W422 - W429
  • [23] Text mining in the classification of digital documents
    Contreras Barrera, Marcial
    BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2016, (64): : 33 - 43
  • [24] EXPLORING DIABETES TOPICS USING TEXT-MINING APPROACHES: SUPPORTING QUALITY CONTROL OF HEALTH INFORMATION IN DIGITAL SPACES
    Shaw, George
    ANNALS OF BEHAVIORAL MEDICINE, 2019, 53 : S220 - S220
  • [25] A text to speech interface for Universal Digital Library
    PRAHALLAD Kishore
    BLACK Alan
    Journal of Zhejiang University Science A(Science in Engineering) , 2005, (11) : 63 - 68
  • [26] Text to speech interface for Universal Digital Library
    Prahallad K.
    Black A.
    Journal of Zhejiang University-SCIENCE A, 2005, 6 (11): : 1229 - 1234
  • [27] Corporate Social Responsibility Reports: Understanding Topics via Text Mining
    Tremblay, Monica Chiarini
    Parra, Carlos M.
    Castellanos, Arturo
    AMCIS 2015 PROCEEDINGS, 2015,
  • [28] Analysis of topics in storytime books based on text mining: Preliminary findings
    Joo S.
    Cahill M.
    Ingram E.
    Proceedings of the Association for Information Science and Technology, 2020, 57 (01):
  • [29] Topics and trends in Mountain Livestock Farming research: a text mining approach
    Zuliani, A.
    Contiero, B.
    Schneider, M. K.
    Arsenos, G.
    Bernues, A.
    Dovc, P.
    Gauly, M.
    Holand, O.
    Martin, B.
    Morgan-Davies, C.
    Zollitsch, W.
    Cozzi, G.
    ANIMAL, 2021, 15 (01)
  • [30] Text Mining Assessment of Sustainability Learning Topics at Higher Education in Japan
    Urushima, Andrea Y. F.
    Tokuchi, Naoko
    Hara, Shoichiro
    2021 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND EDUCATION TECHNOLOGY (ICIET 2021), 2021, : 91 - 97