Healthcare Provider Summary Data for Fraud Classification

被引:1
|
作者
Johnson, Justin M. [1 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, Coll Engn & Comp Sci, Boca Raton, FL 33431 USA
关键词
Healthcare; Medicare; Medical Providers; Fraud Detection; Big Data; Machine Learning; Feature Engineering;
D O I
10.1109/IRI54793.2022.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fraud, waste, and abuse are spreading throughout the healthcare industry and costing patients and taxpayers billions of dollars. Fortunately, electronic medical records and publicly available data sources like the Centers for Medicare & Medicaid Services (CMS) have enabled data mining and machine learning techniques that can help automate the detection of healthcare fraud. In this study, we explore the application of healthcare provider summary data for the purpose of fraud detection. We leverage the latest CMS Part B Summary by Provider big data sets to curate two new labeled data sets for supervised learning. The two new data sets are compared to a popular baseline data set from related works using six runs of cross validation with two popular ensemble learners, multiple complementary performance metrics, and statistical tests. Classification results show that the proposed provider summary features are good indicators of healthcare fraud. A two-way analysis of variance test and 95% confidence intervals show that the new features yield significantly better performance on the fraud detection task when used to enrich existing data sets. Finally, feature contributions are measured with Shapley values to illustrate the top 20 features that contribute to fraud estimation.
引用
收藏
页码:236 / 242
页数:7
相关论文
共 50 条
  • [31] Learning Causal Effects From Observational Data in Healthcare: A Review and Summary
    Shi, Jingpu
    Norgeot, Beau
    FRONTIERS IN MEDICINE, 2022, 9
  • [32] The role of a dental healthcare provider
    Okeson, Jeffrey P.
    CRANIO-THE JOURNAL OF CRANIOMANDIBULAR & SLEEP PRACTICE, 2025, 43 (01): : 6 - 7
  • [33] Identifying Medicare Provider Fraud with Unsupervised Machine Learning
    Bauder, Richard A.
    da Rosa, Raquel C.
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 285 - 292
  • [34] Applying MASI Algorithm to Improve the Classification Performance of Imbalanced Data in Fraud Detection
    Thi-Lich Nghiem
    Thi-Toan Nghiem
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING (ICCSAMA 2019), 2020, 1121 : 150 - 162
  • [35] Data-driven auditing: A predictive modeling approach to fraud detection and classification
    Singh, Nitin
    Lai, Kee-hung
    Vejvar, Markus
    Cheng, T. C. Edwin
    JOURNAL OF CORPORATE ACCOUNTING AND FINANCE, 2019, 30 (03): : 64 - 82
  • [36] Healthcare fraud: Whose problem is it anyway?
    Simborg, Donald W.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (03) : 278 - 280
  • [37] Categorizing and Describing the Types of Fraud in Healthcare
    Thornton, Dallas
    Brinkhuis, Michel
    Amrit, Chintan
    Aly, Robin
    CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015, 2015, 64 : 713 - 720
  • [38] An Enhanced Integrated Method for Healthcare Data Classification with Incompleteness
    Goel, Sonia
    Tushir, Meena
    Arora, Jyoti
    Sharma, Tripti
    Gupta, Deepali
    Nauman, Ali
    Muhammad, Ghulam
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (02): : 3125 - 3145
  • [39] An unbalanced data classification model using hybrid sampling technique for fraud detection
    Padmaja, T. Maruthi
    Dhulipalla, Narendra
    Krishna, P. Radha
    Bapi, Raju S.
    Laha, A.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 341 - +
  • [40] Fast Imbalanced Classification of Healthcare Data with Missing Values
    Razzaghi, Talayeh
    Roderick, Oleg
    Safro, Ilya
    Marko, Nick
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 774 - 781