Digital Approaches to Text Reuse in the Early Chinese Corpus

被引:8
|
作者
Sturgeon, Donald [1 ]
机构
[1] Harvard Univ, Dept East Asian Languages & Civilizat, Cambridge, MA 02138 USA
关键词
text reuse; citation; quotation; similarity; classical Chinese;
D O I
10.1215/23290048-7256963
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Observed textual similarities between different pieces of writing are frequently cited by textual scholars as grounds for interpretative stances about the meaning of a passage and its authorship, authenticity, and accuracy. Historically, identifying occurrences of such similarities has been a matter of extensive knowledge and recall of the content and locations of passages contained within certain texts, together with painstaking manual comparison by examining printed copies, use of concordances, or more recently, appropriate use of full-text searchable database systems. The development of increasingly comprehensive and accurate digital corpora of early Chinese transmitted writing raises many opportunities to study these phenomena using more systematic digital techniques. These offer the promise of not only vast savings in time and labor but also new insights made possible only through exhaustive comparisons of types that would be entirely impractical without the use of computational methods. This article investigates and contrasts unsupervised techniques for the identification of textual similarities in premodern Chinese works in general, and the classical corpus in particular, taking the text of the Mozi as a concrete example. While specific examples are presented in detail to concretely demonstrate the utility and potential of the techniques discussed, all of the methods described are generally applicable to a wide range of materials. With this in mind, this article also introduces an open-access platform designed to help researchers quickly and easily explore these phenomena within those materials most relevant to their own work.
引用
收藏
页码:186 / 213
页数:28
相关论文
共 50 条
  • [31] Text Alignment in the Service of Text Reuse Detection
    Miller, Hadar
    Kuflik, Tsvi
    Lavee, Moshe
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [32] Text corpus with errors
    Pala, K
    Rychly, P
    Smrz, P
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 90 - 97
  • [33] Contextual Urdu Text Emotion Detection Corpus and Experiments using Deep Learning Approaches
    Vardag, Muhammad Hamayon Khan
    Saeed, Ali
    Hayat, Umer
    Ullah, Muhammad Farhat
    Hussain, Naveed
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2022, 11 (04): : 489 - 505
  • [34] Mapping the Early Modern News Flow: An Enquiry by Robust Text Reuse Detection
    Colavizza, Giovanni
    Infelise, Mario
    Kaplan, Frederic
    SOCIAL INFORMATICS, 2015, 8852 : 244 - 253
  • [35] Chinese Text Digital Watermarking Algorithm based on the Edges Number of junction in the Chinese Characters
    Qu Zhaoyang
    Gao Yu
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL IV, 2009, : 113 - 116
  • [36] ArXiv probes 'text reuse'
    不详
    SCIENCE, 2014, 346 (6215) : 1274 - 1274
  • [37] METER: MEasuring TExt Reuse
    Clough, P
    Gaizauskas, R
    Piao, SSL
    Wilks, Y
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 152 - 159
  • [38] The Research on Automatic Construction Techniques of Large-scale Corpus for Chinese Text Categorization
    Hu, Yan
    Wu, Wei
    Miao, Miao
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 640 - 645
  • [39] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
    Chou, FC
    Tseng, CY
    Lee, LS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
  • [40] Examining Patterns of Text Reuse in Digitized Text Collections
    Organisciak, Peter
    Therrell, Grace
    Ryan, Maggie
    Schmidt, Benjamin MacDonald
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 361 - 362