Digital Approaches to Text Reuse in the Early Chinese Corpus

被引:8
|
作者
Sturgeon, Donald [1 ]
机构
[1] Harvard Univ, Dept East Asian Languages & Civilizat, Cambridge, MA 02138 USA
关键词
text reuse; citation; quotation; similarity; classical Chinese;
D O I
10.1215/23290048-7256963
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Observed textual similarities between different pieces of writing are frequently cited by textual scholars as grounds for interpretative stances about the meaning of a passage and its authorship, authenticity, and accuracy. Historically, identifying occurrences of such similarities has been a matter of extensive knowledge and recall of the content and locations of passages contained within certain texts, together with painstaking manual comparison by examining printed copies, use of concordances, or more recently, appropriate use of full-text searchable database systems. The development of increasingly comprehensive and accurate digital corpora of early Chinese transmitted writing raises many opportunities to study these phenomena using more systematic digital techniques. These offer the promise of not only vast savings in time and labor but also new insights made possible only through exhaustive comparisons of types that would be entirely impractical without the use of computational methods. This article investigates and contrasts unsupervised techniques for the identification of textual similarities in premodern Chinese works in general, and the classical corpus in particular, taking the text of the Mozi as a concrete example. While specific examples are presented in detail to concretely demonstrate the utility and potential of the techniques discussed, all of the methods described are generally applicable to a wide range of materials. With this in mind, this article also introduces an open-access platform designed to help researchers quickly and easily explore these phenomena within those materials most relevant to their own work.
引用
收藏
页码:186 / 213
页数:28
相关论文
共 50 条
  • [41] Six ways from Sunday: Approaches to indexing digital text images
    Van Jacob, SJ
    COMPUTERS AND THE HUMANITIES, 1999, 33 (04): : 383 - 407
  • [42] Six Ways from Sunday: Approaches to Indexing Digital Text Images
    Scott J. Van Jacob
    Computers and the Humanities, 1999, 33 : 383 - 407
  • [43] NEW AND REVIVED APPROACHES TO TEXT CRITICISM IN EARLY MUSIC THEORY
    BARBERA, A
    JOURNAL OF MUSICOLOGY, 1991, 9 (01): : 57 - 73
  • [44] Not just rubber-stamping: understanding the amending role of the Chinese legislature with bill text reuse
    Jiang, Jiying
    DEMOCRATIZATION, 2024, 31 (06) : 1252 - 1271
  • [45] Text categorization algorithms using semantic approaches, corpus-based thesaurus and Word Net
    Li, Cheng Hua
    Yang, Ju Cheng
    Park, Soon Cheol
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 765 - 772
  • [46] Digitization in Chinese German Studies using the Example of Corpus Linguistics in the digital Age
    Yuan Li
    Nannan Ge
    JAHRBUCH FUR INTERNATIONALE GERMANISTIK, 2019, 51 (02): : 191 - 203
  • [47] A corpus of Persian literary text
    Raji, Shahab
    Alikhani, Malihe
    de Melo, Gerard
    Stone, Matthew
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 409 - 425
  • [48] Crowdsourcing a Text Corpus is not a Game
    Packham, Sean
    Suleman, Hussein
    DIGITAL LIBRARIES: PROVIDING QUALITY INFORMATION, 2015, 9469 : 225 - 234
  • [49] Turkish Labeled Text Corpus
    Ozturk, Secil
    Sankur, Bulent
    Gungor, Tunga
    Yilmaz, Mustafa Berkay
    Koroglu, Bilge
    Agin, Onur
    Isbilen, Mustafa
    Ulas, Cagdas
    Ahat, Mehmet
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1395 - 1398
  • [50] Development of Sindhi text corpus
    Dootio, Mazhar Ali
    Wagan, Asim Imdad
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (04) : 468 - 475