Digital Approaches to Text Reuse in the Early Chinese Corpus

被引:8
|
作者
Sturgeon, Donald [1 ]
机构
[1] Harvard Univ, Dept East Asian Languages & Civilizat, Cambridge, MA 02138 USA
关键词
text reuse; citation; quotation; similarity; classical Chinese;
D O I
10.1215/23290048-7256963
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Observed textual similarities between different pieces of writing are frequently cited by textual scholars as grounds for interpretative stances about the meaning of a passage and its authorship, authenticity, and accuracy. Historically, identifying occurrences of such similarities has been a matter of extensive knowledge and recall of the content and locations of passages contained within certain texts, together with painstaking manual comparison by examining printed copies, use of concordances, or more recently, appropriate use of full-text searchable database systems. The development of increasingly comprehensive and accurate digital corpora of early Chinese transmitted writing raises many opportunities to study these phenomena using more systematic digital techniques. These offer the promise of not only vast savings in time and labor but also new insights made possible only through exhaustive comparisons of types that would be entirely impractical without the use of computational methods. This article investigates and contrasts unsupervised techniques for the identification of textual similarities in premodern Chinese works in general, and the classical corpus in particular, taking the text of the Mozi as a concrete example. While specific examples are presented in detail to concretely demonstrate the utility and potential of the techniques discussed, all of the methods described are generally applicable to a wide range of materials. With this in mind, this article also introduces an open-access platform designed to help researchers quickly and easily explore these phenomena within those materials most relevant to their own work.
引用
收藏
页码:186 / 213
页数:28
相关论文
共 50 条
  • [1] Unsupervised identification of text reuse in early Chinese literature
    Sturgeon, Donald
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2018, 33 (03) : 670 - 684
  • [2] Patterns of text reuse in a scientific corpus
    Citron, Daniel T.
    Ginsparg, Paul
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (01) : 25 - 30
  • [3] COUNTER: corpus of Urdu news text reuse
    Muhammad Sharjeel
    Rao Muhammad Adeel Nawab
    Paul Rayson
    Language Resources and Evaluation, 2017, 51 : 777 - 803
  • [4] COUNTER: corpus of Urdu news text reuse
    Sharjeel, Muhammad
    Nawab, Rao Muhammad Adeel
    Rayson, Paul
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (03) : 777 - 803
  • [5] Manually Crafted Chinese Text Corpus for Text Emotion Recognition
    Gao, Bo
    Zhang, Fan
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] New sunspots and aurorae in the historical Chinese text corpus? Comments on uncritical digital search applications
    Neuhaeuser, D. L.
    Neuhaeuser, R.
    Chapman, J.
    ASTRONOMISCHE NACHRICHTEN, 2018, 339 (01) : 10 - 29
  • [7] COMPUTATIONAL AND CORPUS APPROACHES TO CHINESE LANGUAGE LEARNING
    Gao, Xuesong
    Fan, Jason
    APPLIED LINGUISTICS, 2022, 43 (04) : 826 - 829
  • [8] CORPUS-BASED EVALUATION OF CHINESE TEXT NORMALIZATION
    Kim, Sunhee
    2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017,
  • [9] Detecting the influence of the Chinese guiding cases: a text reuse approach
    Chen, Benjamin M.
    Li, Zhiyu
    Cai, David
    Ash, Elliott
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (02) : 463 - 486
  • [10] CLEEK: A Chinese Long-text Corpus for Entity Linking
    Zeng, Weixin
    Zhao, Xiang
    Tang, Jiuyang
    Tan, Zhen
    Huang, Xuqian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2026 - 2035