Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [41] Implementation of Unsupervised k-Means Clustering Algorithm within Amazon Web Services Lambda
    Deese, Anthony S.
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 626 - 632
  • [42] Is Unsupervised Clustering Somehow Truer?Is Unsupervised Clustering Somehow Truer?A. Søgaard
    Anders Søgaard
    Minds and Machines, 34 (4)
  • [43] Feature Sampling Based Unsupervised Semantic Clustering for Real Web Multi-View Content
    Gong, Xiaolong
    Huang, Linpeng
    Wang, Fuwei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 102 - 109
  • [44] Recovering Semantics of Tables on the Web
    Venetis, Petros
    Halevy, Alon
    Madhavan, Jayant
    Pasca, Marius
    Shen, Warren
    Wu, Fei
    Miao, Gengxin
    Wu, Chung
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (09): : 528 - 538
  • [45] Putting Web Tables into Context
    Braunschweig, Katrin
    Thiele, Maik
    Koci, Elvis
    Lehner, Wolfgang
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 158 - 165
  • [46] Detecting tables in web documents
    Kim, YS
    Lee, KH
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2005, 18 (06) : 745 - 757
  • [47] Interactive Conversion of Web Tables
    Padmanabhan, Raghav Krishna
    Jandhyala, Ramana Chakradhar
    Krishnamoorthy, Mukkai
    Nagy, George
    Seth, Sharad
    Silversmith, William
    GRAPHICS RECOGNITION: ACHIEVEMENTS, CHALLENGES, AND EVOLUTION, 2010, 6020 : 25 - +
  • [48] An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection
    Zhou, Pei-Yuan
    Chan, Keith C. C.
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 710 - 716
  • [49] Tables on the Web Accessible? Unfortunately NOT!
    Haider, Waciar
    Yesilada, Yeliz
    17TH INTERNATIONAL WEB FOR ALL CONFERENCE (WEB4ALL), 2020,
  • [50] ANNOTATING WEB TABLES WITH THE CROWD
    Wang, Ning
    Liu, Huaxi
    COMPUTING AND INFORMATICS, 2018, 37 (04) : 969 - 991