Identifying Styles of Cross-Language Classics with Pre-Trained Models

被引:0
|
作者
Zhang Y. [1 ]
Deng S. [1 ]
Hu H. [1 ]
Wang D. [2 ]
机构
[1] School of Information Management, Nanjing University, Nanjing
[2] School of Information Management, Nanjing Agricultural University, Nanjing
关键词
Canonical Texts; Digital Humanities; Language Style; Pre-Trained Language Models;
D O I
10.11925/infotech.2096-3467.2022.0926
中图分类号
学科分类号
摘要
[Objective] This paper uses pre-trained language models to explore and study the linguistic style of canonical texts, aiming to improve their connotation quality. [Methods] We compared the performance of five pre-trained language models with the deep learning model Bi-LSTM-CRF on the cross-lingual canonical ancient Chinese-English corpus. The selected works include The Analects of Confucius, The Tao Te Ching, The Book of Rites, The Shangshu, and The Warring States Curse. We also examined the lexicon-based canonical language style. [Results] The SikuBERT pre-trained language model achieved 91.29% precision, 91.76% recall, and 91.52% in concordance mean F1 for recognizing canonical words. The modern Chinese translation yielded deeper semantic meaning, clearer ideographic referents, and more vivid and flexible word combinations than the original canonical words. [Limitations] This study only chose specific pre-Qin classical texts and their translations. More research is needed to examine the models’performance in other domains. [Conclusions] The pre-trained language model SikuBERT could effectively analyze language style differences of cross-lingual canonical texts, which promotes the dissemination of classic Chinese works. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:50 / 62
页数:12
相关论文
共 35 条
  • [1] Zhang Huali, A Pragmatic Study of Modern Chinese, pp. 160-161, (2019)
  • [2] Zhu Keyi, Theoretical Origin and Functional Evolution Path of Language Style Research, Contemporary Rhetoric, 1, pp. 59-71, (2021)
  • [3] Wu Xiaochun, Huang Xuanjing, Wu Lide, Authorship Identification Based on Semantic Analysis, Journal of Chinese Information Processing, 20, 6, pp. 61-68, (2006)
  • [4] Xiao Tianjiu, Liu Ying, A Stylistic Analysis of Jin Yong’s and Gu Long’s Fictions Based on Text Clustering and Classification, Journal of Chinese Information Processing, 29, 5, pp. 167-177, (2015)
  • [5] Wang Yi, Zhang Ruie, Han Mingli, Huainanzi” Chinese-English Parallel Corpus: Construction and Application Prospects, Journal of Anhui University of Science and Technology (Social Science), 23, 1, pp. 84-89, (2021)
  • [6] Fan Min, On Translators’Styles of Five Versions of The Analects: A Statistic Analysis Based on Corpus Studies, Journal of Beijing University of Aeronautics and Astronautics (Social Sciences Edition), 29, 6, pp. 81-88, (2016)
  • [7] Xi Jinping, Building Cultural Confidence and Strength and Securing New Successes in Developing Socialist Culture, Qiushi, 12, pp. 4-12, (2019)
  • [8] Feng Wenhe, Gao Zixiong, Zhang Wenjuan, Review and Trend of Researches on Ancient Chinese Character Information Processing, Library and Information Service, 61, 12, pp. 111-121, (2017)
  • [9] Ye X, Dong M H., A Review on Different English Versions of an Ancient Classic of Chinese Medicine: Huang Di Nei Jing, Journal of Integrative Medicine, 15, 1, pp. 11-18, (2017)
  • [10] Chen Jing, The Characteristics of Dialect Words in Call to Arms and Its Influence on Lu Xun’s Literary Language Style, Journal of Modern Chinese Literature, 4, pp. 16-22, (2022)