EduNER: a Chinese named entity recognition dataset for education research

被引:5
|
作者
Li, Xu [1 ]
Wei, Chengkun [1 ]
Jiang, Zhuoren [2 ]
Meng, Wenlong [1 ]
Ouyang, Fan [3 ]
Zhang, Zihui [4 ]
Chen, Wenzhi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, 38 Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Publ Affairs, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[3] Zhejiang Univ, Coll Educ, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[4] Zhejiang Univ, Informat Technol Ctr, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 24期
基金
中国国家自然科学基金;
关键词
Chinese named entity recognition; Dataset; Benchmark; Education; AGREEMENT;
D O I
10.1007/s00521-023-08635-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A high-quality domain-oriented dataset is crucial for the domain-specific named entity recognition (NER) task. In this study, we introduce a novel education-oriented Chinese NER dataset (EduNER). To provide representative and diverse training data, we collect data from multiple sources, including textbooks, academic papers, and education-related web pages. The collected documents span ten years (2012-2021). A team of domain experts is invited to accomplish the education NER schema definition, and a group of trained annotators is hired to complete the annotation. A collaborative labeling platform is built for accelerating human annotation. The constructed EduNER dataset includes 16 entity types, 11k+ sentences, and 35,731 entities. We conduct a thorough statistical analysis of EduNER and summarize its distinctive characteristics by comparing it with eight open-domain or domain-specific NER datasets. Sixteen state-of-the-art models are further utilized for NER tasks validation. The experimental results can enlighten further exploration. To the best of our knowledge, EduNER is the first publicly available dataset for NER task in the education domain, which may promote the development of education-oriented NER models.
引用
收藏
页码:17717 / 17731
页数:15
相关论文
共 50 条
  • [21] Towards a Standardized Dataset on Indonesian Named Entity Recognition
    Khairunnisa, Siti Oryza
    Imankulova, Aizhan
    Komachi, Mamoru
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 64 - 71
  • [22] SiNER: A Large Dataset for Sindhi Named Entity Recognition
    Ali, Wazir
    Lu, Junyu
    Xu, Zenglin
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2953 - 2961
  • [23] A Research Toward Chinese Named Entity Recognition Based on Transfer Learning
    Hui Kang
    Jingwu Xiao
    Yunpeng Zhang
    Lei Zhang
    Xu Zhao
    Tie Feng
    International Journal of Computational Intelligence Systems, 16
  • [24] A Dataset of German Legal Documents for Named Entity Recognition
    Leitner, Elena
    Rehm, Georg
    Moreno-Schneider, Julian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4478 - 4485
  • [25] A Research Toward Chinese Named Entity Recognition Based on Transfer Learning
    Kang, Hui
    Xiao, Jingwu
    Zhang, Yunpeng
    Zhang, Lei
    Zhao, Xu
    Feng, Tie
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [26] Research on Named Entity Recognition Methods in Chinese Forest Disease Texts
    Wang, Qi
    Su, Xiyou
    APPLIED SCIENCES-BASEL, 2022, 12 (08):
  • [27] Product named entity recognition in Chinese text
    Jun Zhao
    Feifan Liu
    Language Resources and Evaluation, 2008, 42 : 197 - 217
  • [28] A hybrid approach for Chinese named entity recognition
    Fang, XS
    Sheng, HY
    DISCOVERY SCIENCE, PROCEEDINGS, 2002, 2534 : 297 - 301
  • [29] An integrative approach to Chinese Named Entity recognition
    Huang, Degen
    Sun, Xiao
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 171 - +
  • [30] A hybrid model for Chinese named entity recognition
    Sun, Xiao
    Huang, Degen
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 232 - 237