ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problem

被引:18
|
作者
Almagro, Mario [1 ]
Martinez Unanue, Raquel [1 ]
Fresno, Victor [1 ]
Montalvo, Soto [2 ]
机构
[1] Univ Nacl Educ Distancia, Dept Comp Languages & Syst, Madrid 28040, Spain
[2] King Juan Carlos Univ URJC, Dept Comp Sci, Madrid 28933, Spain
关键词
Encoding; Training; Hospitals; Diseases; Proposals; Licenses; Task analysis; Extreme classification; XMTC; ICD-10; coding; text mining;
D O I
10.1109/ACCESS.2020.2997241
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Medical coding is used to identify and standardize clinical concepts in the records collected from healthcare services. The tenth revision of the International Classification of Diseases (ICD-10) is the most widely-used coding with more than 11,000 different diagnoses, affecting research, reporting, and funding. Unfortunately, ICD-10 code sets tend to follow biased, unbalanced, and scattered distributions. These distribution attributes, along with high lexical variability, severely restrict performance when coded clinical records are used to infer code sets in uncoded records. To improve that inference, we explore a combination of example-based methods optimized to capture codes with different appearance frequencies in data sets. Materials and Methods: The proposed exploration has been carried out on Spanish hospital discharge reports coded by experts, excluding all sentences without any biomedical concept. Representations based on semantic and lexical features are explored, using both global and label-specific attributes. In turn, algorithms based on binary outputs, groups of subsets and extreme classification are compared. Lists of codes together with their confidence values (certainty probabilities) are suggested by each method. Results: Diverse spectral behaviors are shown for each method. Binary classifiers seem to maximize the capture of more popular codes, while extreme classifiers promote infrequent ones. In order to exploit such differences, ensemble approaches are proposed by weighting every output code according to the method, confidence value and appearance frequency. The rule-based combination reaches a 46% Precision at 10 (P@10), which means a 15% improvement over the best individual proposal. Conclusion: Assembling methods based on weighting each code according to training frequency and performance can achieve better overall Precision scores on extreme distributions, such as ICD-10 coding.
引用
收藏
页码:100073 / 100083
页数:11
相关论文
共 50 条
  • [1] Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish
    Blanco, Alberto
    Remmer, Sonja
    Perez, Alicia
    Dalianis, Hercules
    Casillas, Arantza
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 130
  • [2] A lexical method for assisted extraction and coding of ICD-10 diagnoses from free text patient discharge summaries
    Blanquet, A
    Zweigenbaum, P
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1999, : 1029 - 1029
  • [3] The influence of a specific ophthalmological electronic health record on ICD-10 coding
    Kortuem, Karsten
    Hirneiss, Christoph
    Mueller, Michael
    Babenko, Alexander
    Kampik, Anselm
    Kreutzer, Thomas C.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
  • [4] ICD-10 Coding for Gastroparesis: An Institutional Electronic Health Record Validation
    Litzenberg, Kevin
    Krishna, Somashekar G.
    Balasubramanian, Gokulakrishnan
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2020, 115 : S1656 - S1656
  • [5] The Impact of ICD-10 on Obstetric Coding
    Gomez, Daniela
    Jayakumaran, Jenani
    Schuster, Meike
    OBSTETRICS AND GYNECOLOGY, 2019, 133 : 182S - 182S
  • [6] Basics of ICD-10 Coding for Neuropsychologists
    Boake, Corwin
    CLINICAL NEUROPSYCHOLOGIST, 2017, 31 (04) : 695 - 695
  • [7] Reliability of trauma coding with ICD-10
    Asadi, Farkhondeh
    Hosseini, Maryam Ahmadi
    Almasi, Sohrab
    CHINESE JOURNAL OF TRAUMATOLOGY, 2022, 25 (02) : 102 - 106
  • [8] Coding chronic pain in ICD-10
    Treede, R. -D.
    Mueller-Schwefe, G.
    Thoma, R.
    SCHMERZ, 2010, 24 (03): : 207 - 208
  • [9] Accuracy of ICD-10 Coding for Anaphylaxis
    Uthairat, Monthida
    Sakulchit, Teeranai
    Sangsupawanich, Pasuree
    JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, 2016, 137 (02) : AB54 - AB54
  • [10] Reliability of trauma coding with ICD-10
    Asadi Farkhondeh
    Hosseini Maryam Ahmadi
    Almasi Sohrab
    中华创伤杂志英文版, 2022, 25 (02) : 102 - 106