An improved algorithm for unsupervised decomposition of a multi-author document

被引:3
|
作者
Giannella, Chris [1 ]
机构
[1] Mitre Corp, Human Language Technol Dept, 7515 Colshire Dr, Mclean, VA 22102 USA
关键词
natural language processing; machine learning;
D O I
10.1002/asi.23375
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article addresses the problem of unsupervised decomposition of a multi-author text document: identifying the sentences written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a nontrivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy levels of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
引用
收藏
页码:400 / 411
页数:12
相关论文
共 50 条
  • [31] The order in the lists of authors in multi-author papers revisited
    Kosmulski, Marek
    JOURNAL OF INFORMETRICS, 2012, 6 (04) : 639 - 644
  • [32] Introduction to the multi-author review on methylation in cellular physiology
    David Shechter
    Cellular and Molecular Life Sciences, 2019, 76 : 2871 - 2872
  • [33] Multi-author Review From peptidoglycan biosynthesis to antibiotic resistance
    J.-M. Frère
    Cellular and Molecular Life Sciences CMLS, 1998, 54 : 299 - 299
  • [34] A hierarchical feature decomposition clustering algorithm for unsupervised classification of document image types
    Curtis, Dean
    Kubushyn, Vitaliy
    Yfantis, E. A.
    Rogers, Michael
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 423 - 428
  • [35] The need to quantify authors' relative intellectual contributions in a multi-author paper
    Rahman, Mohammad Tariqur
    Mac Regenstein, Joe
    Abu Kassim, Noor Lide
    Haque, Nazmul
    JOURNAL OF INFORMETRICS, 2017, 11 (01) : 275 - 281
  • [36] The image of Siberia in the multi-author poetry collection Siberian Motifs (1886)
    V. Smolianinov, Artem
    TOMSK STATE UNIVERSITY JOURNAL, 2024, (503):
  • [37] Multi-author Review¶Epigenetic control of transcription¶Introduction: the genetics of epigenetics
    S. M. Gasser
    R. Paro
    F. Stewart
    R. Aasland
    Cellular and Molecular Life Sciences CMLS, 1998, 54 (1): : 1 - 5
  • [38] Unsupervised extractive multi-document text summarization using a Genetic Algorithm
    Neri-Mendoza, Veronica
    Ledeneva, Yulia
    Garcia-Hernandez, Rene Arnulfo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2397 - 2408
  • [39] Temporal evolution of multi-author papers in basic sciences from 1960 to 2010
    Huang, Ding-wei
    SCIENTOMETRICS, 2015, 105 (03) : 2137 - 2147
  • [40] Temporal evolution of multi-author papers in basic sciences from 1960 to 2010
    Ding-wei Huang
    Scientometrics, 2015, 105 : 2137 - 2147