Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [31] Web-Based Dyscalculia Screening with Unsupervised Clustering: Moroccan Fourth Grade Students
    Ikermane, Mohamed
    El Mouatasim, A.
    EMERGING TRENDS IN INTELLIGENT SYSTEMS & NETWORK SECURITY, 2023, 147 : 512 - 519
  • [32] On competitive unsupervised clustering
    Boujemaa, N
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 631 - 634
  • [33] Header detection of data tables - Towards the improvement of the Web navigation for impared visual people
    Fernandez, Juan Manuel
    Soler, Vicenc
    WEBIST 2008: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2008, : 241 - 246
  • [34] Towards unsupervised radiograph clustering for COVID-19: The use of multi-view
    Dornaika, F.
    El Hajjar, S.
    Charafeddine, J.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [35] Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia
    Fedorov P.E.
    Mironov A.V.
    Chernishev G.A.
    Lobachevskii Journal of Mathematics, 2023, 44 (1) : 111 - 122
  • [36] An Unsupervised Fuzzy Clustering Method for Shot Clustering
    Zhou, Zhihao
    Chen, Xiaonan
    DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 263 - 268
  • [37] Towards a theory of tables
    Matthew Hurst
    International Journal of Document Analysis and Recognition (IJDAR), 2006, 8 : 123 - 131
  • [38] Towards a theory of tables
    Hurst, Matthew
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) : 123 - 131
  • [39] Towards justifying unsupervised stationary decisions for geostatistical modeling: Ensemble spatial and multivariate clustering with geomodeling specific clustering metrics
    Martin, Ryan
    Boisvert, Jeff
    COMPUTERS & GEOSCIENCES, 2018, 120 : 82 - 96
  • [40] Towards adaptive web mining: Histograms and contexts in text data clustering
    Ciesielski, Krzysztof
    Klopotek, Mieczyslaw A.
    ADVANCES IN INTELLIGENT DATA ANALYSIS VII, PROCEEDINGS, 2007, 4723 : 284 - +