Tri-Training for Authorship Attribution with Limited Training Data

被引:0
|
作者
Qian, Tieyun [1 ]
Liu, Bing [2 ]
Chen, Li [1 ]
Peng, Zhiyong [3 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Wuhan 430072, Hubei, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel three-view tri-training method to iteratively identify authors of unlabeled data to augment the training set. The key idea is to first represent each document in three distinct views, and then perform tri-training to exploit the large amount of unlabeled documents. Starting from 10 training documents per author, we systematically evaluate the effectiveness of the proposed tri-training method for AA. Experimental results show that the proposed approach outperforms the state-of-the-art semi-supervised method CNG+SVM and other baselines.
引用
收藏
页码:345 / 351
页数:7
相关论文
共 50 条
  • [41] A Novel Semi-supervised SVM Based on Tri-training
    Li, KunLun
    Zhang, Wei
    Ma, Xiaotao
    Cao, Zheng
    Zhang, Chao
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 47 - +
  • [42] 基于Tri-training的评价单元识别
    蒋润
    顾春华
    阮彤
    计算机应用, 2014, 34 (04) : 1099 - 1104
  • [43] 基于Tri-Training半监督分类算法的研究
    张雁
    吕丹桔
    吴保国
    计算机技术与发展, 2013, 23 (07) : 77 - 79+83
  • [44] 基于密度峰值聚类的Tri-training算法
    罗宇航
    吴润秀
    崔志华
    张翼英
    何业慎
    赵嘉
    系统仿真学报 , 2024, (05) : 1189 - 1198
  • [45] 基于Tri-training的多特征融合图像检索
    陈秀新
    郑雅
    于重重
    贾克斌
    计算机应用研究, 2014, 31 (11) : 3506 - 3509
  • [46] Mixup Asymmetric Tri-Training for Heartbeat Classification under Domain Shift
    Li, Jiawei
    Wang, Guijin
    Chen, Ming
    Ding, Zijian
    Yang, Huazhong
    IEEE Signal Processing Letters, 2021, 28 : 718 - 722
  • [47] Trust Prediction Based on Extreme Learning Machine and Asymmetric Tri-Training
    Wang, Yan
    Tong, Xiangrong
    IEEE ACCESS, 2021, 9 : 64358 - 64367
  • [48] Mixup Asymmetric Tri-Training for Heartbeat Classification Under Domain Shift
    Li, Jiawei
    Wang, Guijin
    Chen, Ming
    Ding, Zijian
    Yang, Huazhong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 718 - 722
  • [49] Semi-supervised PolSAR Classification Based on Improved Tri-training
    Hua, Wenqiang
    Wang, Shuang
    Zhao, Yang
    Yue, Bo
    Guo, Yanhe
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3937 - 3940
  • [50] Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback
    Saito, Yuta
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 309 - 318