Unsupervised identification of text reuse in early Chinese literature

被引:13
|
作者
Sturgeon, Donald [1 ]
机构
[1] Harvard Univ, Fairbank Ctr Chinese Studies, Room S126,CGIS South Bldg,1730 Cambridge St, Cambridge, MA 02138 USA
关键词
D O I
10.1093/llc/fqx024
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Text reuse in early Chinese transmitted texts is extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving the work of multiple authors and editors. In this study, a fully automated method of identifying and representing complex text reuse patterns is presented, and the results evaluated by comparison to a manually compiled reference work. The resultant data are integrated into a widely used and publicly available online database system with browse, search, and visualization functionality. These same results are then aggregated to create a model of text reuse relationships at a corpus level, revealing patterns of systematic reuse among groups of texts. Lastly, the large number of reuse instances identified make possible the analysis of frequently observed string substitutions, which are observed to be strongly indicative of partial synonymy between strings.
引用
收藏
页码:670 / 684
页数:15
相关论文
共 50 条
  • [31] METER: MEasuring TExt Reuse
    Clough, P
    Gaizauskas, R
    Piao, SSL
    Wilks, Y
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 152 - 159
  • [32] The botanical identification of Shancigu in Chinese ancient literature
    Bing, Qi-Zhong
    Zhang, Ben-Gang
    JOURNAL OF SYSTEMATICS AND EVOLUTION, 2008, 46 (05) : 785 - 792
  • [33] Examining Patterns of Text Reuse in Digitized Text Collections
    Organisciak, Peter
    Therrell, Grace
    Ryan, Maggie
    Schmidt, Benjamin MacDonald
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 361 - 362
  • [34] Unsupervised Statistical Text Simplification
    Qiang, Jipeng
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1802 - 1806
  • [35] Unsupervised Matching of Data and Text
    Ahmadi, Naser
    Sand, Hansjorg
    Papotti, Paolo
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1058 - 1070
  • [36] Unsupervised Controllable Text Formalization
    Jain, Parag
    Mishra, Abhijit
    Azad, Amar Prakash
    Sankaranarayanan, Karthik
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6554 - 6561
  • [37] On knowledgeable unsupervised text mining
    Hotho, A
    Maedche, A
    Staab, S
    Zacharias, V
    TEXT MINING: THEORETICAL ASPECTS AND APPLICATIONS, 2003, : 131 - 152
  • [38] Not just rubber-stamping: understanding the amending role of the Chinese legislature with bill text reuse
    Jiang, Jiying
    DEMOCRATIZATION, 2024, 31 (06) : 1252 - 1271
  • [39] Identification of Adverse Drug Events in Chinese Clinical Narrative Text
    Ge, Caixia
    Zhang, Yinsheng
    Duan, Huilong
    Li, Haomin
    UBIQUITOUS COMPUTING APPLICATION AND WIRELESS SENSOR, 2015, 331 : 605 - 612
  • [40] Free-gram phrase identification for modeling Chinese text
    Peng, Xi
    Yi, Zhang
    Wei, Xiao-Yong
    Peng, De-Zhong
    Sang, Yong-Sheng
    INFORMATION PROCESSING LETTERS, 2013, 113 (04) : 137 - 144