An improved algorithm for unsupervised decomposition of a multi-author document

被引:3
|
作者
Giannella, Chris [1 ]
机构
[1] Mitre Corp, Human Language Technol Dept, 7515 Colshire Dr, Mclean, VA 22102 USA
关键词
natural language processing; machine learning;
D O I
10.1002/asi.23375
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article addresses the problem of unsupervised decomposition of a multi-author text document: identifying the sentences written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a nontrivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy levels of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
引用
收藏
页码:400 / 411
页数:12
相关论文
共 50 条
  • [1] Unsupervised Multi-Author Document Decomposition Based on Hidden Markov Model
    Aldebei, Khaled
    He, Xiangjian
    Jia, Wenjing
    Yang, Jie
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 706 - 714
  • [2] Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model
    Aldebei, Khaled
    He, Xiangjian
    Yang, Jie
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 501 - 505
  • [3] SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model
    Aldebei, Khaled
    He, Xiangjian
    Jia, Wenjing
    Yeh, Weichang
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2018, 69 (02) : 201 - 214
  • [4] A Generic Unsupervised Method for Decomposing Multi-Author Documents
    Akiva, Navot
    Koppel, Moshe
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (11): : 2256 - 2264
  • [5] MULTI-AUTHOR BOOKS
    SMITH, KGV
    BIOLOGIST, 1984, 31 (02) : 69 - 69
  • [6] MULTI-AUTHOR BOOKS
    BUSVINE, JR
    BIOLOGIST, 1983, 30 (03) : 123 - 123
  • [7] Writing multi-author documents
    McCubbin, N.
    Pulp and Paper Canada, 2001, 102 (07):
  • [8] Writing multi-author documents
    McCubbin, N
    PULP & PAPER-CANADA, 2001, 102 (07) : 58 - 58
  • [9] Conducting the multi-author choir
    Leeming, Jack
    NATURE, 2019, 575 (7783) : S36 - S37
  • [10] Author ranking in multi-author collaborative networks
    Sastry, Chandramouli Shama
    Jagaluru, Darshan S.
    Mahesh, Kavi
    COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT, 2016, 10 (01) : 21 - 40