EduNER: a Chinese named entity recognition dataset for education research

被引:5
|
作者
Li, Xu [1 ]
Wei, Chengkun [1 ]
Jiang, Zhuoren [2 ]
Meng, Wenlong [1 ]
Ouyang, Fan [3 ]
Zhang, Zihui [4 ]
Chen, Wenzhi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, 38 Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Publ Affairs, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[3] Zhejiang Univ, Coll Educ, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[4] Zhejiang Univ, Informat Technol Ctr, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 24期
基金
中国国家自然科学基金;
关键词
Chinese named entity recognition; Dataset; Benchmark; Education; AGREEMENT;
D O I
10.1007/s00521-023-08635-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A high-quality domain-oriented dataset is crucial for the domain-specific named entity recognition (NER) task. In this study, we introduce a novel education-oriented Chinese NER dataset (EduNER). To provide representative and diverse training data, we collect data from multiple sources, including textbooks, academic papers, and education-related web pages. The collected documents span ten years (2012-2021). A team of domain experts is invited to accomplish the education NER schema definition, and a group of trained annotators is hired to complete the annotation. A collaborative labeling platform is built for accelerating human annotation. The constructed EduNER dataset includes 16 entity types, 11k+ sentences, and 35,731 entities. We conduct a thorough statistical analysis of EduNER and summarize its distinctive characteristics by comparing it with eight open-domain or domain-specific NER datasets. Sixteen state-of-the-art models are further utilized for NER tasks validation. The experimental results can enlighten further exploration. To the best of our knowledge, EduNER is the first publicly available dataset for NER task in the education domain, which may promote the development of education-oriented NER models.
引用
收藏
页码:17717 / 17731
页数:15
相关论文
共 50 条
  • [1] EduNER: a Chinese named entity recognition dataset for education research
    Xu Li
    Chengkun Wei
    Zhuoren Jiang
    Wenlong Meng
    Fan Ouyang
    Zihui Zhang
    Wenzhi Chen
    Neural Computing and Applications, 2023, 35 : 17717 - 17731
  • [2] SciCN: A Scientific Dataset for Chinese Named Entity Recognition
    Yang, Jing
    Ji, Bin
    Li, Shasha
    Ma, Jun
    Yu, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03): : 4303 - 4315
  • [3] Survey of Chinese Named Entity Recognition Research
    Zhao, Jigui
    Qian, Yurong
    Wang, Kui
    Hou, Shuxiang
    Chen, Jiaying
    Computer Engineering and Applications, 2024, 60 (01) : 15 - 27
  • [4] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [5] Research on Chinese Named Entity Recognition in the Marine Field
    Cao, Xiaojuan
    Yang, Yongquan
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [6] Research on Chinese Named Entity Recognition Based on Ontology
    Chang, Weili
    Luo, Fang
    Qian, Jilai
    MECHANICAL ENGINEERING AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2012, 195-196 : 1180 - 1185
  • [7] Research of Chinese Named Entity Recognition Using GATE
    Cheng, Chen
    Cheng, Xianyi
    Hua, Jin
    BIOTECHNOLOGY, CHEMICAL AND MATERIALS ENGINEERING, PTS 1-3, 2012, 393-395 : 262 - 264
  • [8] Research on College Academic Text Named Entity Recognition and Dataset Construction
    He, Chen
    Yuan, Yingchun
    Wang, Kejian
    Tao, Jia
    Computer Engineering and Applications, 2023, 59 (22) : 322 - 328
  • [9] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [10] DroNER: Dataset for drone named entity recognition
    Silalahi, Swardiantara
    Ahmad, Tohari
    Studiawan, Hudan
    DATA IN BRIEF, 2023, 48