rEHR: An R package for manipulating and analysing Electronic Health Record data

被引:11
|
作者
Springate, David A. [1 ,2 ]
Parisi, Rosa [3 ]
Olier, Ivan [4 ]
Reeves, David [1 ,2 ]
Kontopantelis, Evangelos [1 ,5 ,6 ]
机构
[1] Univ Manchester, NIHR Sch Primary Care Res, Manchester, Lancs, England
[2] Univ Manchester, Ctr Biostat, Fac Biol Med & Hlth, Manchester, Lancs, England
[3] Univ Manchester, Ctr Pharmacoepidemiol & Drug Safety, Fac Biol Med & Hlth, Manchester, Lancs, England
[4] Manchester Metropolitan Univ, Sch Comp Math & Digital Technol, Informat Res Ctr, Manchester, Lancs, England
[5] Univ Manchester, Farr Inst Hlth Informat Res, Fac Biol Med & Hlth, Manchester, Lancs, England
[6] Vaughan House,Portsmouth St, Manchester M13 9GB, Lancs, England
来源
PLOS ONE | 2017年 / 12卷 / 02期
关键词
PERFORMANCE;
D O I
10.1371/journal.pone.0171784
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced.
引用
收藏
页数:25
相关论文
共 50 条
  • [11] Electronic health record data for antimicrobial prescribing
    Haeusler, Gabrielle M.
    Thursky, Karin A.
    LANCET INFECTIOUS DISEASES, 2021, 21 (02): : 155 - 157
  • [12] Determining the Electronic Signature of Infection in Electronic Health Record Data
    Churpek, Matthew M.
    Dumanian, Jay
    Dussault, Nicole
    Bhavani, Sivasubramanium, V
    Carey, Kyle A.
    Gilbert, Emily R.
    Arain, Erum
    Ye, Chen
    Winslow, Christopher J.
    Shah, Nirav S.
    Afshar, Majid
    Edelson, Dana P.
    CRITICAL CARE MEDICINE, 2021, 49 (07) : E673 - E682
  • [13] ELECTRONIC HEALTH RECORD DATA AS A POPULATION HEALTH SURVEILLANCE TOOL
    Johnson, P. J.
    VanWormer, J. J.
    Winden, T. J.
    Britt, H. R.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 171 : S109 - S109
  • [14] Fas Parser: a package for manipulating sequence data
    Yan-Bo Sun
    Zoological Research, 2017, (02) : 110 - 112
  • [15] An R package AZIAD for analysing zero-inflated and zero-altered data
    Mousavi, Niloufar Dousti
    Aldirawi, Hani
    Yang, Jie
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (16) : 2801 - 2827
  • [16] An R package AZIAD for analysing zero-inflated and zero-altered data
    Dousti Mousavi, Niloufar
    Aldirawi, Hani
    Yang, Jie
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021,
  • [17] msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data
    Benjamin T. Mayne
    Shalem Y. Leemaqz
    Sam Buckberry
    Carlos M. Rodriguez Lopez
    Claire T. Roberts
    Tina Bianco-Miotto
    James Breen
    Scientific Reports, 8
  • [18] msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data
    Mayne, Benjamin T.
    Leemaqz, Shalem Y.
    Buckberry, Sam
    Lopez, Carlos M. Rodriguez
    Roberts, Claire T.
    Bianco-Miotto, Tina
    Breen, James
    SCIENTIFIC REPORTS, 2018, 8
  • [19] Electronic health record: confidentiality and privacy of clinical data
    Gil Yacobazzo, Juan Eduardo
    Viega Rodriguez, Maria Jose
    REVISTA MEDICA DEL URUGUAY, 2018, 34 (04): : 228 - 233
  • [20] Tensions in Representing Behavioral Data in an Electronic Health Record
    Marcu, Gabriela
    Dey, Anind K.
    Kiesler, Sara
    COMPUTER SUPPORTED COOPERATIVE WORK-THE JOURNAL OF COLLABORATIVE COMPUTING AND WORK PRACTICES, 2021, 30 (03): : 393 - 424