Tri-Training for Authorship Attribution with Limited Training Data

被引:0
|
作者
Qian, Tieyun [1 ]
Liu, Bing [2 ]
Chen, Li [1 ]
Peng, Zhiyong [3 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Wuhan 430072, Hubei, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel three-view tri-training method to iteratively identify authors of unlabeled data to augment the training set. The key idea is to first represent each document in three distinct views, and then perform tri-training to exploit the large amount of unlabeled documents. Starting from 10 training documents per author, we systematically evaluate the effectiveness of the proposed tri-training method for AA. Experimental results show that the proposed approach outperforms the state-of-the-art semi-supervised method CNG+SVM and other baselines.
引用
收藏
页码:345 / 351
页数:7
相关论文
共 50 条
  • [1] Tri-Training for authorship attribution with limited training data: a comprehensive study
    Qian, Tieyun
    Liu, Bing
    Chen, Li
    Peng, Zhiyong
    Zhong, Ming
    He, Guoliang
    Li, Xuhui
    Xu, Gang
    NEUROCOMPUTING, 2016, 171 : 798 - 806
  • [2] Improved Tri-training with Unlabeled Data
    Guo, Tao
    Li, Guiyang
    SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING: THEORY AND PRACTICE, VOL 2, 2012, 115 : 139 - 147
  • [3] AR-Tri-training: Tri-training with Assistant Strategy
    Cui Long Jie
    Wang Hong Li
    Cui Rong Yi
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1840 - 1844
  • [4] Scene understanding with tri-training
    Zhu, Lin
    Zhou, Jie
    Song, Jingyan
    MIPPR 2007: AUTOMATIC TARGET RECOGNITION AND IMAGE ANALYSIS; AND MULTISPECTRAL IMAGE ACQUISITION, PTS 1 AND 2, 2007, 6786
  • [5] Tri-training and MapReduce-based massive data learning
    Guo, Mao-Zu
    Deng, Chao
    Liu, Yang
    Li, Ping
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2011, 40 (04) : 355 - 380
  • [6] Tri-training: Exploiting unlabeled data using three classifiers
    Zhou, ZH
    Li, M
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1529 - 1541
  • [7] Tri-training based learning from positive and unlabeled data
    Zhang, Bangzuo
    Zuo, Wanli
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 640 - 644
  • [8] Chinese chunking with tri-training learning
    Chen, Wenliang
    Zhang, Yujie
    Isahara, Hitoshi
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 466 - +
  • [9] Revisiting Tri-training of Dependency Parsers
    Wagner, Joachim
    Foster, Jennifer
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9457 - 9473
  • [10] Offline data-driven evolutionary optimization based on tri-training
    Huang, Pengfei
    Wang, Handing
    Jin, Yaochu
    SWARM AND EVOLUTIONARY COMPUTATION, 2021, 60