Tri-Training for Authorship Attribution with Limited Training Data

被引:0
|
作者
Qian, Tieyun [1 ]
Liu, Bing [2 ]
Chen, Li [1 ]
Peng, Zhiyong [3 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Wuhan 430072, Hubei, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel three-view tri-training method to iteratively identify authors of unlabeled data to augment the training set. The key idea is to first represent each document in three distinct views, and then perform tri-training to exploit the large amount of unlabeled documents. Starting from 10 training documents per author, we systematically evaluate the effectiveness of the proposed tri-training method for AA. Experimental results show that the proposed approach outperforms the state-of-the-art semi-supervised method CNG+SVM and other baselines.
引用
收藏
页码:345 / 351
页数:7
相关论文
共 50 条
  • [31] A Method for CIR Fault Diagnosis Based on Improved Tri-training in Big Data Environment
    Qu, Jian-tao
    Liu, Feng
    Meng, He
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 213 - 218
  • [32] 基于Tri-Training算法的数据编辑技术
    张雁
    林英
    吕丹桔
    计算机与数字工程, 2013, 41 (10) : 1583 - 1585
  • [33] 基于Tri-Training的驾驶风格分类算法
    董昊旻
    张维轩
    王文彬
    何云廷
    康子怡
    汽车技术, 2021, (04) : 6 - 11
  • [34] Deep Tri-Training for Semi-Supervised Image Segmentation
    An, Shan
    Zhu, Haogang
    Zhang, Jiaao
    Ye, Junjie
    Wang, Siliang
    Yin, Jianqin
    Zhang, Hong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10097 - 10104
  • [35] Authorship Attribution with Very Few Labeled Data: A Co-training Approach
    Fan, Mengdi
    Qian, Tieyun
    Chen, Li
    Liu, Bin
    Zhong, Ming
    He, Guoliang
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 657 - 668
  • [36] Classification of Hyperspectral Data Based on Semi-supervised Tri-training Learning Framework
    Huang, Rui
    Zhou, Lina
    ADVANCED MATERIALS IN MICROWAVES AND OPTICS, 2012, 500 : 374 - 382
  • [37] A new relational Tri-training system with adaptive data editing for inductive logic programming
    Li, Yanjuan
    Guo, Maozu
    KNOWLEDGE-BASED SYSTEMS, 2012, 35 : 173 - 185
  • [38] Boosted Web Named Entity Recognition via Tri-Training
    Chou, Chien-Lung
    Chang, Chia-Hui
    Huang, Ya-Yun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 16 (02)
  • [39] 基于交叉熵的安全Tri-training算法
    张永
    陈蓉蓉
    张晶
    计算机研究与发展, 2021, (01) : 60 - 69
  • [40] 基于Tri-training的入侵检测算法
    邬书跃
    余杰
    樊晓平
    计算机工程, 2012, 38 (06) : 158 - 160