Text mining in the SOMLib Digital Library System: The representation of topics and genres

被引:9
|
作者
Rauber, A [1 ]
Merkl, D [1 ]
机构
[1] Vienna Univ Technol, Dept Software Technol, A-1040 Vienna, Austria
关键词
document clustering; Self-Organizing Map (SOM); genre analysis; metaphor graphics; digital libraries;
D O I
10.1023/A:1023297920966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing amount of textual information available in electronic form, more powerful methods for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This paper presents the SOMLIb digital library system, built on neural networks to provide text mining capabilities. At its foundation we use the Self-Organizing Map to provide content-based clustering of documents. By using an extended model, i.e. the Growing Hierarchical Self-Organizing Map, we can further detect subject hierarchies in a document collection, with the neural network adapting its size and structure automatically during its unsupervised training process to reflect the topical hierarchy. By mining the weight vector structure of the trained maps our system is able to select keywords describing the various topical clusters. Text mining has to incorporate more than the mere analysis of content. Structural and genre information are key in organizing and locating information. Using color-coding techniques we can integrate a structural analysis of documents based on Self-Organizing Maps into the subject-based clustering relying on metaphor graphics for intuitive visualization. We demonstrate the capabilities of the SOMLib system using collections of articles from various newspapers and magazines.
引用
收藏
页码:271 / 293
页数:23
相关论文
共 50 条
  • [11] Discovering genres of online discussion threads via text mining
    Lin, Fu-Ren
    Hsieh, Lu-Shih
    Chuang, Fu-Tai
    COMPUTERS & EDUCATION, 2009, 52 (02) : 481 - 495
  • [12] Digital representation and the text model
    Buzetti, D
    NEW LITERARY HISTORY, 2002, 33 (01) : 61 - 88
  • [13] Documents, Topics, and Authors: Text Mining of Online News
    Sertkan, Mete
    Neidhardt, Julia
    Werthner, Hannes
    2019 IEEE 21ST CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL 1, 2019, : 405 - 413
  • [14] An approach for text categorization in digital library
    Wang, Tao
    Desai, Bipin C.
    IDEAS 2007: 11TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2007, : 21 - 27
  • [15] The SINAMED and ISIS projects:: Applying text mining techniques to improve access to a medical digital library
    de Buenaga, Manuel
    Mana, Manuel
    Gachet, Diego
    Mata, Jacinto
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2006, 4172 : 548 - 551
  • [16] Domain analysis with text mining: Analysis of digital library research trends using profiling methods
    Lee, Jae Yun
    Kim, Heejung
    Kim, Pan Jun
    JOURNAL OF INFORMATION SCIENCE, 2010, 36 (02) : 144 - 161
  • [17] Extending Web Mining to Digital Forensics Text Mining
    Hicks, Chelsea
    Beebe, Nicole Lang
    Haliscak, Brandi
    AMCIS 2016 PROCEEDINGS, 2016,
  • [18] Music representation in a digital music library
    Byrd, D
    Isaacson, E
    2003 JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2003, : 234 - 236
  • [19] Mining a Digital Library for Influential Authors
    Mimno, David
    McCallum, Andrew
    PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, : 105 - 106
  • [20] Fuzzy Bag-of-Topics Model for Short Text Representation
    Jia, Hao
    Li, Qing
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 473 - 482