Measuring Chinese-English Cross-Lingual Word Similarity with HowNet and Parallel Corpus

被引:0
|
作者
Xia, Yunqing [1 ]
Zhao, Taotao [1 ,2 ]
Yao, Jianmin [2 ]
Jin, Peng [3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Suzhou Univ, Sch Comp Sci & Techmol, Suzhou 215006, Peoples R China
[3] Leshan Normal Univ, Lab Intelligent Informat Processing & Applicat, Leshan 614004, Peoples R China
关键词
Cross-lingual word similarity; cross-lingual information access; HowNet; parallel corpus; SEMANTIC SIMILARITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual word similarity (CLWS) is a basic component in cross-lingual information access systems. Designing a CLWS measure faces three challenges: (i) Cross-lingual knowledge base is rare; (ii) Cross-lingual corpora are limited; and (iii) No benchmark cross-lingual dataset is available for CLWS evaluation. This paper presents some Chinese-English CLWS measures that adopt HowNet as cross-lingual knowledge base and sentence-level parallel corpus as development data. In order to evaluate these measures, a Chinese-English cross-lingual benchmark dataset is compiled based on the Miller-Charles' dataset. Two conclusions are drawn from the experimental results. Firstly, HowNet is a promising knowledge base for the CLWS measure. Secondly. parallel corpus is promising to fine-tune the word similarity measures usine cross-lingual co-occurrence statistics.
引用
收藏
页码:221 / +
页数:3
相关论文
共 50 条
  • [1] CLTC: A Chinese-English Cross-lingual Topic Corpus
    Xia, Yunqing
    Tang, Guoyu
    Jin, Peng
    Yang, Xia
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 532 - 537
  • [2] Developing a Cross-lingual Semantic Word Similarity Corpus for English-Urdu Language Pair
    Fatima, Ghazeefa
    Nawab, Rao Muhammad Adeel
    Khan, Muhammad Salman
    Saeed, Ali
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [3] The application of the comparable corpora in Chinese-English Cross-Lingual Information Retrieval
    Du, L
    Zhang, YB
    Sun, L
    Sun, YF
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2001, 16 (04) : 351 - 358
  • [4] The application of the comparable corpora in Chinese-English Cross-Lingual Information Retrieval
    Lin Du
    Yibo Zhang
    Le Sun
    Yufang Sun
    Journal of Computer Science and Technology, 2001, 16 : 351 - 358
  • [5] The Application of the Comparable Corpora in Chinese-English Cross-Lingual Information Retrieval
    杜林
    张毅波
    孙乐
    孙玉芳
    Journal of Computer Science and Technology, 2001, (04) : 351 - 358
  • [6] Chinese-English cross-lingual information retrieval based on domain ontology knowledge
    Yu, Feng
    Zheng, Dequan
    Zhao, Tiejun
    Li, Sheng
    Yu, Hao
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1460 - 1463
  • [7] Weakly Supervised SVM for Chinese-English Cross-lingual Subcategorization Lexicon Acquisition
    Han, Xiwu
    Lv, Chengguo
    Zhao, Tiejun
    PROCEEDINGS OF THE 11TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2008,
  • [8] Design of New Word Retrieval Algorithm for Chinese-English Bilingual Parallel Corpus
    Zhang, Liting
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [9] Manipuri–English comparable corpus for cross-lingual studies
    Lenin Laitonjam
    Sanasam Ranbir Singh
    Language Resources and Evaluation, 2023, 57 : 377 - 413
  • [10] The Construction of Chinese-English Parallel Translation Corpus
    Hu, Weihua
    He, Haizhen
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 690 - 695