Unsupervised identification of text reuse in early Chinese literature

被引:13
|
作者
Sturgeon, Donald [1 ]
机构
[1] Harvard Univ, Fairbank Ctr Chinese Studies, Room S126,CGIS South Bldg,1730 Cambridge St, Cambridge, MA 02138 USA
关键词
D O I
10.1093/llc/fqx024
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Text reuse in early Chinese transmitted texts is extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving the work of multiple authors and editors. In this study, a fully automated method of identifying and representing complex text reuse patterns is presented, and the results evaluated by comparison to a manually compiled reference work. The resultant data are integrated into a widely used and publicly available online database system with browse, search, and visualization functionality. These same results are then aggregated to create a model of text reuse relationships at a corpus level, revealing patterns of systematic reuse among groups of texts. Lastly, the large number of reuse instances identified make possible the analysis of frequently observed string substitutions, which are observed to be strongly indicative of partial synonymy between strings.
引用
收藏
页码:670 / 684
页数:15
相关论文
共 50 条
  • [1] Digital Approaches to Text Reuse in the Early Chinese Corpus
    Sturgeon, Donald
    JOURNAL OF CHINESE LITERATURE AND CULTURE, 2018, 5 (02) : 186 - 213
  • [3] An Unsupervised Method for Entity Mentions Extraction in Chinese Text
    Xu, Jing
    Gan, Liang
    Zhou, Bin
    Wu, Quanyuan
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 320 - 328
  • [4] An Unsupervised Method for Linking Entity Mentions in Chinese Text
    Xu, Jing
    Gan, Liang
    Zhou, Bin
    Wu, Quanyuan
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 183 - 195
  • [5] Detecting the influence of the Chinese guiding cases: a text reuse approach
    Chen, Benjamin M.
    Li, Zhiyu
    Cai, David
    Ash, Elliott
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (02) : 463 - 486
  • [6] Unsupervised language model adaptation for handwritten Chinese text recognition
    Wang, Qiu-Feng
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2014, 47 (03) : 1202 - 1216
  • [7] Orientation identification for Chinese short text
    Zuo, Shen-zheng
    Zhou, Yan-quan
    Zhong, Yi-xin
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 125 - +
  • [8] THE OBJECT OF THE TEXT IN EARLY AMERICAN LITERATURE
    SHIELDS, DS
    EARLY AMERICAN LITERATURE, 1990, 25 (03) : 307 - 315
  • [9] Semantic separator learning and its applications in unsupervised Chinese text parsing
    Yuming Wu
    Xiaodong Luo
    Zhen Yang
    Frontiers of Computer Science, 2013, 7 : 55 - 68
  • [10] Semantic separator learning and its applications in unsupervised Chinese text parsing
    Wu, Yuming
    Luo, Xiaodong
    Yang, Zhen
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (01) : 55 - 68