Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [21] Branching synchronization grammars with nested tables
    Drewes, F
    Engelfriet, J
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 68 (03) : 611 - 656
  • [22] Towards Large-Scale Unsupervised Relation Extraction from the Web
    Min, Bonan
    Shi, Shuming
    Grishman, Ralph
    Lin, Chin-Yew
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2012, 8 (03) : 1 - 23
  • [23] Towards Unsupervised Sudden Data Drift Detection in Federated Learning with Fuzzy Clustering
    Stallmann, Morris
    Wilbik, Anna
    Weiss, Gerhard
    2024 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ-IEEE 2024, 2024,
  • [24] Unsupervised Clustering of PH Using Circulating miRNA - Towards Molecular Classification of PH?
    Errington, N.
    Kariotis, S.
    Jammeh, E.
    Fong, Y.
    Lihan, Z.
    Chen, H.
    Jatkoe, T.
    Bridges, C.
    Vener, T.
    Wharton, J.
    Thompson, R.
    Toshner, M.
    Howard, L. S.
    Rhodes, C. J.
    Wilkins, M.
    Wang, D.
    Lawrie, A.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2022, 205
  • [25] Towards Unlocking Web Video: Automatic People Tracking and Clustering
    Holub, Alex
    Moreels, Pierre
    Islam, Atiq
    Makhanov, Andrei
    Yang, Rui
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 433 - 440
  • [26] Unsupervised clustering of shapes
    Daliri, Mohammad Reza
    Torre, Vincent
    ADVANCES IN VISUAL COMPUTING, PT 1, 2006, 4291 : 712 - +
  • [27] Unsupervised Web Name Disambiguation Using Semantic Similarity and Single-Pass Clustering
    Iosif, Elias
    ARTIFICIAL INTELLIGENCE: THEORIES, MODELS AND APPLICATIONS, PROCEEDINGS, 2010, 6040 : 133 - 141
  • [28] Unsupervised fuzzy clustering
    Zahid, N
    Abouelala, O
    Limouri, M
    Essaid, A
    PATTERN RECOGNITION LETTERS, 1999, 20 (02) : 123 - 129
  • [29] Unsupervised distributed clustering
    Tasoulis, DK
    Vrahatis, MN
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2004, : 347 - 351
  • [30] Unsupervised possibilistic clustering
    Yang, MS
    Wu, KL
    PATTERN RECOGNITION, 2006, 39 (01) : 5 - 21