Identifying low-quality patterns in accident reports from textual data

被引:5
|
作者
Macedo, July B. [1 ,2 ]
Ramos, Plinio M. S. [1 ,2 ]
Maior, Caio B. S. [1 ,3 ]
Moura, Marcio J. C. [1 ,2 ]
Lins, Isis D. [1 ,2 ]
Vilela, Romulo F. T. [4 ]
机构
[1] Univ Fed Pernambuco, CEERMA Ctr Risk Anal Reliabil Engn & Environm Mod, Recife, PE, Brazil
[2] Univ Fed Pernambuco, Dept Prod Engn, Recife, PE, Brazil
[3] Univ Fed Pernambuco, Technol Ctr, Recife, PE, Brazil
[4] Companhia Hidrelect Sao Francisco CHESF, Ico, Brazil
关键词
occupational safety; automatic classification; natural language processing; machine learning; topic modeling; safety culture; accident analysis; SUPPORT VECTOR MACHINES; DECISION-SUPPORT; INJURY; RELIABILITY; MANAGEMENT; SYSTEM;
D O I
10.1080/10803548.2022.2111847
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Accident investigation reports provide useful knowledge to support companies to propose preventive and mitigative measures. However, the information presented in accident report databases is normally large, complex, filled with errors and has missing and/or redundant data. In this article, we propose text mining and natural language processing techniques to investigate low-quality accident reports. We adopted machine learning (ML) to detect and investigate inconsistencies on accident reports. The methodology was applied to 626 documents collected from an actual hydroelectric power company. The initial ML performances indicated data divergences and concerns related to the report structure. Then, the accident database was restructured to a more proper form confirming the supposition about the quality of the reports investigated. The proposed approach can be used as a diagnostic tool to improve the design of accident investigation reports to provide a more useful source of knowledge to support decisions in the safety context.
引用
收藏
页码:1088 / 1100
页数:13
相关论文
共 50 条
  • [1] The Impacts of Low-Quality Training Data on Information Extraction from Clinical Reports
    Marcheggiani, Diego
    Sebastiani, Fabrizio
    ERCIM NEWS, 2018, (112): : 45 - 46
  • [2] Identifying low-quality preclinical studies
    Zivin, Justin A.
    STROKE, 2008, 39 (10) : 2697 - 2698
  • [3] Is Procrastination Related to Low-Quality Data?
    Voss, Nathaniel M.
    Vangsness, Lisa
    EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2020, 39 (04) : 95 - 104
  • [4] Mining fuzzy association rules from low-quality data
    Palacios, A. M.
    Gacto, M. J.
    Alcala-Fdez, J.
    SOFT COMPUTING, 2012, 16 (05) : 883 - 901
  • [5] Mining fuzzy association rules from low-quality data
    A. M. Palacios
    M. J. Gacto
    J. Alcalá-Fdez
    Soft Computing, 2012, 16 : 883 - 901
  • [6] Sifting Truths from Multiple Low-Quality Data Sources
    Xie, Zizhe
    Liu, Qizhi
    Bao, Zhifeng
    WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 74 - 81
  • [7] Identifying and mitigating low-quality labels for deep learning in glaucoma
    Hsu, Joy
    Phene, Sonia
    Luo, Jieying
    Mitani, Akinori
    Hammel, Naama
    Krause, Jonathan
    Sayres, Rory
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2020, 61 (07)
  • [8] A New Method for Identifying Low-Quality Data in Perceived Usability Crowdsourcing Tests: Differences in Questionnaire Scores
    Wang, Yuhui
    Chen, Xuan
    Zhou, Xuan
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (22) : 7297 - 7313
  • [9] Indexation and misorientation analysis of low-quality Laue diffraction patterns
    Gupta, Vipul K.
    Agnew, Sean R.
    JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2009, 42 : 116 - 124
  • [10] Investigation of Multiple Imputation in Low-Quality Questionnaire Data
    Van Ginkel, Joost R.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2010, 45 (03) : 574 - 598