Semi-Supervised Classification of Network Data Using Very Few Labels

被引:67
|
作者
Lin, Frank [1 ]
Cohen, William W. [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
10.1109/ASONAM.2010.19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost [1] proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods (e.g., [2], [3]) and is shown to be essentially the same as the Gaussian-field classifier proposed by Zhu et al. [4] and proves to be very effective on some benchmark network datasets. We describe another simple and intuitive semi-supervised learning method based on random graph walk that outperforms wvRN by a large margin on several benchmark datasets when very few labels are available. Additionally, we show that using authoritative instances as training seeds - instances that arguably cost much less to label - dramatically reduces the amount of labeled data required to achieve the same classification accuracy. For some existing state-of-the-art semi-supervised learning methods the labeled data needed is reduced by a factor of 50.
引用
收藏
页码:192 / 199
页数:8
相关论文
共 50 条
  • [21] Semi-supervised Fuzzy Min–Max Neural Network for Data Classification
    Jinhai Liu
    Yanjuan Ma
    Fuming Qu
    Dong Zang
    Neural Processing Letters, 2020, 51 : 1445 - 1464
  • [22] Semi-supervised classification using bridging
    Chan, Jason
    Koprinska, Irena
    Poon, Josiah
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2008, 17 (03) : 415 - 431
  • [23] Semi-supervised and compound classification of network traffic
    Zhang, J. (jun.zhang@deakin.edu.au), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (07):
  • [24] Semi-Supervised Medical Image Classification with Pseudo Labels Using Coalition Similarity Training
    Liu, Kun
    Ling, Shuyi
    Liu, Sidong
    MATHEMATICS, 2024, 12 (10)
  • [25] Enhancing Classification of Energy Meters with Limited Labels using a Semi-Supervised Generative Model
    Fu, Chun
    Kazmi, Hussain
    Quintana, Matias
    Miller, Clayton
    PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2023, 2023, : 450 - 453
  • [26] The game theoretic p-Laplacian and semi-supervised learning with few labels
    Calder, Jeff
    NONLINEARITY, 2019, 32 (01) : 301 - 330
  • [27] GraFN: Semi-Supervised Node Classification on Graph with Few Labels via Non-Parametric Distribution Assignment
    Lee, Junseok
    Oh, Yunhak
    In, Yeonjun
    Lee, Namkyeong
    Hyun, Dongmin
    Park, Chanyoung
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2243 - 2248
  • [28] ClusterClean: a Weak Semi-Supervised Approach for Cleaning Data Labels
    Dimitriadou, Kyriaki
    Manghwani, Rahul
    Hoad, Timothy C.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4589 - 4595
  • [29] A Semi-Supervised Learning Algorithm for Data Classification
    Kuo, Cheng-Chien
    Shieh, Horng-Lin
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [30] Data preprocessing in semi-supervised SVM classification
    Astorino, A.
    Gorgone, E.
    Gaudioso, M.
    Pallaschke, D.
    OPTIMIZATION, 2011, 60 (1-2) : 143 - 151