Healthcare Provider Summary Data for Fraud Classification

被引:1
|
作者
Johnson, Justin M. [1 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, Coll Engn & Comp Sci, Boca Raton, FL 33431 USA
关键词
Healthcare; Medicare; Medical Providers; Fraud Detection; Big Data; Machine Learning; Feature Engineering;
D O I
10.1109/IRI54793.2022.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fraud, waste, and abuse are spreading throughout the healthcare industry and costing patients and taxpayers billions of dollars. Fortunately, electronic medical records and publicly available data sources like the Centers for Medicare & Medicaid Services (CMS) have enabled data mining and machine learning techniques that can help automate the detection of healthcare fraud. In this study, we explore the application of healthcare provider summary data for the purpose of fraud detection. We leverage the latest CMS Part B Summary by Provider big data sets to curate two new labeled data sets for supervised learning. The two new data sets are compared to a popular baseline data set from related works using six runs of cross validation with two popular ensemble learners, multiple complementary performance metrics, and statistical tests. Classification results show that the proposed provider summary features are good indicators of healthcare fraud. A two-way analysis of variance test and 95% confidence intervals show that the new features yield significantly better performance on the fraud detection task when used to enrich existing data sets. Finally, feature contributions are measured with Shapley values to illustrate the top 20 features that contribute to fraud estimation.
引用
收藏
页码:236 / 242
页数:7
相关论文
共 50 条
  • [21] SHAP Algorithm for Healthcare Data Classification
    Mihirette, Samson
    Tan, Qing
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 : 363 - 374
  • [22] Modeling Insurance Fraud Detection Using Imbalanced Data Classification
    Hassan, Amira Kamil Ibrahim
    Abraham, Ajith
    ADVANCES IN NATURE AND BIOLOGICALLY INSPIRED COMPUTING, 2016, 419 : 117 - 127
  • [23] Semi-supervised Classification of Fraud Data in Commercial Auctions
    Elshaar, Sulaf
    Sadaoui, Samira
    APPLIED ARTIFICIAL INTELLIGENCE, 2020, 34 (01) : 47 - 63
  • [24] The educated healthcare provider
    van Rijswijk, L
    OSTOMY WOUND MANAGEMENT, 2006, 52 (05) : 8 - 8
  • [25] Mobile technologies to support healthcare provider to healthcare provider communication and management of care
    Goncalves-Bradley, Daniela C.
    Maria, Ana Rita J.
    Ricci-Cabello, Ignacio
    Villanueva, Gemma
    Fonhus, Marita S.
    Glenton, Claire
    Lewin, Simon
    Henschke, Nicholas
    Buckley, Brian S.
    Mehl, Garrett L.
    Tamrat, Tigest
    Shepperd, Sasha
    COCHRANE DATABASE OF SYSTEMATIC REVIEWS, 2020, (08):
  • [26] Environmental classification of petroleum substances - Summary data and rationale
    King, D.
    Bradfield, M.
    Falkenback, P.
    Parkerton, T.
    Peterson, D.
    Remy, E.
    Toy, R.
    Wright, M.
    Dmytrasz, B.
    Short, D.
    CONCAWE Reports, 2001, (54):
  • [27] Editorial: Fraud and Corruption in Healthcare
    Timofeyev, Yuriy
    Jakovljevic, Mihajlo
    FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [28] Big Data Classification and Internet of Things in Healthcare
    Rghioui, Amine
    Lloret, Jaime
    Oumnad, Abedlmajid
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2020, 11 (02) : 20 - 37
  • [29] Enhancing healthcare data integrity: fraud detection using unsupervised learning techniques
    Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
    不详
    Int J Comput Appl, 2024, 11 (1006-1019):
  • [30] Statistical Depth for Text Data: An Application to the Classification of Healthcare Data
    Bolivar, Sergio
    Nieto-Reyes, Alicia
    Rogers, Heather L.
    MATHEMATICS, 2023, 11 (01)