Evaluating Reference String Extraction Using Line-Based Conditional Random Fields: A Case Study with German Language Publications

被引:2
|
作者
Koerner, Martin [1 ]
Ghavimi, Behnam [2 ]
Mayr, Philipp [2 ]
Hartmann, Heinrich
Staab, Steffen [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Koblenz, Germany
[2] GESIS Leibniz Inst Social Sci, Cologne, Germany
关键词
Reference extraction; Citations; Conditional random fields; German language papers; INFORMATION EXTRACTION;
D O I
10.1007/978-3-319-67162-8_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publication as a potential part of a reference string. By applying line-based conditional random fields rather than constructing the graphical model based on individual words, dependencies and patterns that are typical in reference sections provide strong features while the overall complexity of the model is reduced. We evaluated our novel approach RefExt against various state-of-the-art tools (CERMINE, GROBID, and ParsCit) and a gold standard which consists of 100 German language full text publications from the social sciences. The evaluation demonstrates that we are able to outperform state-of-the-art tools which rely on the identification of reference section areas.
引用
收藏
页码:137 / 145
页数:9
相关论文
共 11 条
  • [11] Exploring relationships between in-hospital mortality and hospital case volume using random forest: results of a cohort study based on a nationwide sample of German hospitals, 2016-2018
    Roessler, Martin
    Walther, Felix
    Eberlein-Gonska, Maria
    Scriba, Peter C.
    Kuhlen, Ralf
    Schmitt, Jochen
    Schoffer, Olaf
    BMC HEALTH SERVICES RESEARCH, 2022, 22 (01)