An improved algorithm for unsupervised decomposition of a multi-author document

被引:3
|
作者
Giannella, Chris [1 ]
机构
[1] Mitre Corp, Human Language Technol Dept, 7515 Colshire Dr, Mclean, VA 22102 USA
关键词
natural language processing; machine learning;
D O I
10.1002/asi.23375
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article addresses the problem of unsupervised decomposition of a multi-author text document: identifying the sentences written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a nontrivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy levels of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
引用
收藏
页码:400 / 411
页数:12
相关论文
共 50 条
  • [41] A case study of the modified g index: Counting multi-author publications fractionally
    Schreiber, Michael
    JOURNAL OF INFORMETRICS, 2010, 4 (04) : 636 - 643
  • [42] Unsupervised Word Decomposition with the Promodes Algorithm
    Spiegler, Sebastian
    Golenia, Bruno
    Flach, Peter A.
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 625 - 632
  • [43] The Biometric Based Convertible Undeniable Multi-Signature Scheme to Ensure Multi-Author Copyrights and Profits
    SungHyun Yun
    Heuiseok Lim
    Young-Sik Jeong
    SoonYoung Jung
    Jae-Khun Chang
    Wireless Personal Communications, 2011, 60 : 405 - 418
  • [44] An improved multi-objective optimization algorithm based on decomposition
    Wang, Wanliang
    Wang, Zheng
    Li, Guoqing
    Ying, Senliang
    2019 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2019, : 327 - 333
  • [45] The Biometric Based Convertible Undeniable Multi-Signature Scheme to Ensure Multi-Author Copyrights and Profits
    Yun, SungHyun
    Lim, Heuiseok
    Jeong, Young-Sik
    Jung, SoonYoung
    Chang, Jae-Khun
    WIRELESS PERSONAL COMMUNICATIONS, 2011, 60 (03) : 405 - 418
  • [46] IMA: Identification of Multi-author Student Assignment Submissions Using a Data Mining Approach
    Burn-Thornton, Kathryn
    Burman, Tim
    DBKDA 2011: THE THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN DATABASES, KNOWLEDGE, AND DATA APPLICATIONS, 2011, : 136 - 141
  • [47] Golden-ratio as a substitute to geometric and harmonic counting to determine multi-author publication credit
    Berker, Yannick
    SCIENTOMETRICS, 2018, 114 (03) : 839 - 857
  • [48] Authorship Matrix: A Rational Approach to Quantify Individual Contributions and Responsibilities in Multi-Author Scientific Articles
    Clement, T. Prabhakar
    SCIENCE AND ENGINEERING ETHICS, 2014, 20 (02) : 345 - 361
  • [49] pyUPMASK: an improved unsupervised clustering algorithm
    Pera, M. S.
    Perren, G., I
    Moitinho, A.
    Navone, H. D.
    Vazquez, R. A.
    ASTRONOMY & ASTROPHYSICS, 2021, 650
  • [50] Improved unsupervised anomaly detection algorithm
    Luo, Na
    Yuan, Fuyu
    Zuo, Wanli
    He, Fengling
    Zhou, Zhiguo
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 532 - +