An Ontology-based and Domain Specific Clustering Methodology for Financial Documents

被引:0
|
作者
Kulathunga, Chalitha [1 ]
Karunaratne, D. D. [1 ]
机构
[1] Univ Colombo, Sch Comp, Colombo, Sri Lanka
关键词
Financial document clustering; WordNet based clustering; Resnik similarity; Word sense disambiguation; SEMANTIC SIMILARITY; WORDNET;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Financial documents play an important role in modern financial analysis and information retrieval tasks. In order to accomplish various investigational needs, financial organizations continuously search for accurate and meaningful unsupervised document classification techniques. Nevertheless, unsupervised document categorization or document clustering is a challenging problem studied by many scientists. Incorporating semantic knowledge from an ontology into document clustering has been extensively studied and it has provided enhanced clustering performances. The incorporated semantic knowledge is generally used for identifying the correct meanings of the ambiguous words in the documents. Most of the proposed methodologies were experimented on general document datasets and most of the few available domain specific clustering studies were constrained to specific domains where complete domain ontologies are available. Although financial domain has several domain ontologies, none of them are complete and suitable for semantic document clustering. In this context, our study proposes a document clustering methodology for financial documents which adapts WordNet ontology to the financial domain to serve as an external knowledge source. This study empirically shows that nouns are relatively prevalent and more important for document clustering rather than other terms in a document. Afterwards, a subset of nouns is identified as most important for the clustering, based on their frequency distribution within the main noun list. We developed a word sense disambiguation technique which uses ontological knowledge for noun disambiguation. Finally, nouns in each document are disambiguated with the proposed word sense disambiguation technique, associated with tf-idf weights and clustered. On the basis of the empirical results of this research, it can be concluded that the proposed methodology can significantly enhance the clustering performance compared to no disambiguation and pure WordNet based disambiguation approaches.
引用
收藏
页码:209 / 216
页数:8
相关论文
共 50 条
  • [21] Ontology-based Domain Knowledge Representation
    Sun Yu
    Li Zhiping
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 174 - +
  • [22] Ontology-based agent modeling - a formal methodology to incorporate a domain ontology in a multi-agent system
    Georgoudakis, M.
    Alexakos, C.
    Kalogeras, A.
    Gialelis, J.
    Koubias, S.
    WFCS 2008: IEEE INTERNATIONAL WORKSHOP ON FACTORY COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 367 - +
  • [23] An Ontology-Based Spatial Clustering Selection System
    Gu, Wei
    Wang, Xin
    Ziebelin, Danielle
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5549 : 215 - +
  • [24] Performance of Ontology-Based Semantic Similarities in Clustering
    Batet, Montserrat
    Valls, Aida
    Gibert, Karina
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2010, 6113 : 281 - +
  • [25] Towards an ontology-based spatial clustering framework
    Wang, X
    Hamilton, HJ
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 205 - 216
  • [26] Ontology-based fuzzy web services clustering
    Gholamzadeh N.
    Taghiyareh F.
    2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 721 - 725
  • [27] Ontology-based automatic classification and ranking for web documents
    Fang, Jun
    Guo, Lei
    Wang, XiaoDong
    Yang, Ning
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 627 - 631
  • [28] Ontology-based similarity between text documents on manifold
    Wen, Guihua
    Jiang, Lijun
    Shadbolt, Nigel R.
    SEMANTIC WEB - ASWC 2006, PROCEEDINGS, 2006, 4185 : 113 - 125
  • [29] Ontology-Based Indexing Method for Engineering Documents Retrieval
    Fang, Weiguang
    Guo, Yu
    Liao, Wenhe
    2016 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA 2016), 2016, : 172 - 176
  • [30] An Ontology-based Methodology for Semantic Expansion Search
    Zou, Guobing
    Zhang, Bofeng
    Gan, Yanglan
    Zhang, Jianwen
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 453 - +