Tri-Training for Authorship Attribution with Limited Training Data

被引:0
|
作者
Qian, Tieyun [1 ]
Liu, Bing [2 ]
Chen, Li [1 ]
Peng, Zhiyong [3 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Wuhan 430072, Hubei, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel three-view tri-training method to iteratively identify authors of unlabeled data to augment the training set. The key idea is to first represent each document in three distinct views, and then perform tri-training to exploit the large amount of unlabeled documents. Starting from 10 training documents per author, we systematically evaluate the effectiveness of the proposed tri-training method for AA. Experimental results show that the proposed approach outperforms the state-of-the-art semi-supervised method CNG+SVM and other baselines.
引用
收藏
页码:345 / 351
页数:7
相关论文
共 50 条
  • [21] 基于Tri-training的半监督SVM
    李昆仑
    张伟
    代运娜
    计算机工程与应用, 2009, 45 (22) : 103 - 106
  • [22] 基于特征变换的Tri-Training算法
    赵文亮
    郭华平
    范明
    计算机工程, 2014, 40 (05) : 183 - 187+191
  • [23] An Improved Social Spammer Detection Based on Tri-training
    Xu, Guangxia
    Zhao, Jingteng
    Huang, Deling
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 4040 - 4042
  • [24] Biomedical Named Entity Recognition with Tri-training learning
    Cai, YueHong
    Cheng, XianYi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 2178 - +
  • [25] Tri-training Based on Neural Network Ensemble Algorithm
    Zhang, Xiaojie
    Bai, Bendu
    Li, Ying
    INTELLIGENT SCIENCE AND INTELLIGENT DATA ENGINEERING, ISCIDE 2011, 2012, 7202 : 43 - 49
  • [26] A Reliable Application of MPC for Securing the Tri-Training Algorithm
    Kurniawan, Hendra
    Mambo, Masahiro
    IEEE ACCESS, 2023, 11 : 34718 - 34735
  • [27] An Improved Algorithm for Relation Extraction Based on Tri-Training
    Zhong, Zhinong
    Liu, FangChi
    Wu, Ye
    Jing, Ning
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 1077 - 1080
  • [28] 基于Tri-training的主动学习算法
    张雁
    吴保国
    吕丹桔
    林英
    计算机工程, 2014, 40 (06) : 215 - 218+229
  • [29] Web Spam Detection Based on Improved Tri-training
    Li, Hailong
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 61 - 65
  • [30] Multi-Source Tri-Training Transfer Learning
    Cheng, Yuhu
    Wang, Xuesong
    Cao, Ge
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1668 - 1672