Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [11] Towards a phenomenology of hyperrealistic dolls
    Conte, Pietro
    VALENCIANA, 2019, 12 (23) : 293 - 314
  • [12] SHAPE-BASED WEB IMAGE CLUSTERING FOR UNSUPERVISED OBJECT DETECTION
    Zheng, Wei
    Wang, Changhu
    Chen, Xilin
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [13] The BankSearch web document dataset: investigating unsupervised clustering and category similarity
    Sinka, MP
    Corne, DW
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2005, 28 (02) : 129 - 146
  • [14] Towards an unsupervised optimal fuzzy clustering algorithm for image database organization
    Xiong, XJ
    Chan, KL
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 897 - 900
  • [15] Two-level clustering towards unsupervised discovery of acoustic classes
    Gracia, Ciro
    Anguera, Xavier
    Binefa, Xavier
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 299 - 302
  • [16] Towards Clustering of Web-based Document Structures
    Dehmer, Matthias
    Emmert-Streib, Frank
    Kilian, Juergen
    Zulauf, Andreas
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 10, 2005, 10 : 289 - 294
  • [17] CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles
    Thaís A. R. Ramos
    Vinicius Maracaja-Coutinho
    J. Miguel Ortega
    Thaís G. do Rêgo
    BMC Research Notes, 13
  • [18] Chinese web page classifier based on support vector machine and unsupervised clustering
    Li, X.L.
    Liu, J.M.
    Shi, Z.Z.
    Jisuanji Xuebao/Chinese Journal of Computers, 2001, 24 (01): : 62 - 68
  • [19] CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles
    Ramos, Thais A. R.
    Maracaja-Coutinho, Vinicius
    Ortega, J. Miguel
    do Rego, Thais G.
    BMC RESEARCH NOTES, 2020, 13 (01)
  • [20] Marginal Nested Interactions for Contingency Tables
    Cazzaro, Manuela
    Colombi, Roberto
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (13) : 2799 - 2814