EduNER: a Chinese named entity recognition dataset for education research

被引:5
|
作者
Li, Xu [1 ]
Wei, Chengkun [1 ]
Jiang, Zhuoren [2 ]
Meng, Wenlong [1 ]
Ouyang, Fan [3 ]
Zhang, Zihui [4 ]
Chen, Wenzhi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, 38 Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Publ Affairs, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[3] Zhejiang Univ, Coll Educ, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[4] Zhejiang Univ, Informat Technol Ctr, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 24期
基金
中国国家自然科学基金;
关键词
Chinese named entity recognition; Dataset; Benchmark; Education; AGREEMENT;
D O I
10.1007/s00521-023-08635-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A high-quality domain-oriented dataset is crucial for the domain-specific named entity recognition (NER) task. In this study, we introduce a novel education-oriented Chinese NER dataset (EduNER). To provide representative and diverse training data, we collect data from multiple sources, including textbooks, academic papers, and education-related web pages. The collected documents span ten years (2012-2021). A team of domain experts is invited to accomplish the education NER schema definition, and a group of trained annotators is hired to complete the annotation. A collaborative labeling platform is built for accelerating human annotation. The constructed EduNER dataset includes 16 entity types, 11k+ sentences, and 35,731 entities. We conduct a thorough statistical analysis of EduNER and summarize its distinctive characteristics by comparing it with eight open-domain or domain-specific NER datasets. Sixteen state-of-the-art models are further utilized for NER tasks validation. The experimental results can enlighten further exploration. To the best of our knowledge, EduNER is the first publicly available dataset for NER task in the education domain, which may promote the development of education-oriented NER models.
引用
收藏
页码:17717 / 17731
页数:15
相关论文
共 50 条
  • [31] Chinese Data Extraction and Named Entity Recognition
    Yang, Tingwei
    Jiang, Daguang
    Shi, Shenghui
    Than, Siyan
    Zhuo, Lin
    Yin, Yukang
    Liang, Zheng
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 105 - 109
  • [32] Bag of Tricks for Chinese Named Entity Recognition
    Xiao, Yao
    Peng, Jingbo
    Fu, Luoyi
    Zhang, Haisong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [33] Multitask Learning for Chinese Named Entity Recognition
    Zhang, Qun
    Li, Zhenzhen
    Feng, Dawei
    Li, Dongsheng
    Huang, Zhen
    Peng, Yuxing
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 653 - 662
  • [34] Chinese named entity recognition: The state of the art
    Liu, Pan
    Guo, Yanming
    Wang, Fenglei
    Li, Guohui
    Neurocomputing, 2022, 473 : 37 - 53
  • [35] Product named entity recognition in Chinese text
    Zhao, Jun
    Liu, Feifan
    LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (02) : 197 - 217
  • [36] CLASSIFICATION ATTENTION FOR CHINESE NAMED ENTITY RECOGNITION
    Cong, Kai
    Wang, Yunpeng
    Li, Tao
    Xu, Yanbin
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2021, 22 (09) : 1675 - 1686
  • [37] Chinese named entity recognition: The state of the art
    Liu, Pan
    Guo, Yanming
    Wang, Fenglei
    Li, Guohui
    NEUROCOMPUTING, 2022, 473 : 37 - 53
  • [38] DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect
    Moussa, Hanane Nour
    Mourhir, Asmaa
    DATA IN BRIEF, 2023, 48
  • [39] AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6571 - 6577
  • [40] Research on Chinese Named Entity Recognition Based on Lexical Information and Spatial Features
    Zhang, Zhipeng
    Liu, Shengquan
    Jian, Zhaorui
    Yin, Huixin
    APPLIED SCIENCES-BASEL, 2024, 14 (06):