Unsupervised identification of text reuse in early Chinese literature

被引：13

作者：

Sturgeon, Donald ^{[1
]}

机构：

[1] Harvard Univ, Fairbank Ctr Chinese Studies, Room S126,CGIS South Bldg,1730 Cambridge St, Cambridge, MA 02138 USA

来源：

DIGITAL SCHOLARSHIP IN THE HUMANITIES | 2018年 / 33卷 / 03期

关键词：

D O I：

10.1093/llc/fqx024

中图分类号：

C [社会科学总论];

学科分类号：

03 ; 0303 ;

摘要：

Text reuse in early Chinese transmitted texts is extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving the work of multiple authors and editors. In this study, a fully automated method of identifying and representing complex text reuse patterns is presented, and the results evaluated by comparison to a manually compiled reference work. The resultant data are integrated into a widely used and publicly available online database system with browse, search, and visualization functionality. These same results are then aggregated to create a model of text reuse relationships at a corpus level, revealing patterns of systematic reuse among groups of texts. Lastly, the large number of reuse instances identified make possible the analysis of frequently observed string substitutions, which are observed to be strongly indicative of partial synonymy between strings.

引用

页码：670 / 684

页数：15

共 50 条

[41] Unsupervised Neural Text Simplification
Surya, Sai
Mishra, Abhijit
Laha, Anirban
Jain, Parag
Sankaranarayanan, Karthik
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2058 - 2068
[42] EARLY CHINESE-LITERATURE - WATSON,B
SHIH, VYC
JOURNAL OF ASIAN STUDIES, 1963, 22 (04): : 475 - 475
[43] Discrimination and Distortion──The Chinese in Early Australian Literature
黄源深
外国语(上海外国语大学学报), 1995, (03) : 55 - 59
[44] THE THEME OF THE PRECOCIOUS CHILD IN EARLY CHINESE LITERATURE
KINNEY, AB
TOUNG PAO, 1995, 81 (1-3) : 1 - 24
[45] The Embodied Text: Establishing Textual Identity in Early Chinese Manuscripts
Kotera, Atsushi
INTERNATIONAL JOURNAL OF ASIAN STUDIES, 2014, 11 (01) : 111 - 112
[46] Improving Chinese Writer Identification by Fusion of Text-dependent and Text-independent Methods
Xiong, Yu-Jie
Liu, Li
Wang, Patrick S. P.
Lu, Yue
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 25 - 29
[47] The Mozi as an Evolving Text: Different Voices in Early Chinese Thought
Weingarten, Oliver
ARCHIV ORIENTALNI, 2014, 82 (02) : 405 - 407
[48] The Embodied Text: Establishing Textual Identity in Early Chinese Manuscripts
Sanft, Charles
FRONTIERS OF HISTORY IN CHINA, 2013, 8 (04) : 631 - 634
[49] The Embodied Text: Establishing Textual Identity in Early Chinese Manuscripts
Galambos, Imre
EAST ASIAN PUBLISHING AND SOCIETY, 2014, 4 (02) : 184 - 187
[50] The Embodied Text: Establishing Textual Identity in Early Chinese Manuscripts
Krijgsman, Rens
TOUNG PAO, 2013, 99 (4-5) : 544 - 548

← 1 2 3 4 5 →