Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities

被引:3
|
作者
Seedat N. [2 ]
Imrie F. [1 ]
Van Der Schaar M. [2 ,3 ]
机构
[1] University of California, Los Angeles
[2] University of Cambridge, Cambridge
[3] Alan Turing Institute, London
来源
关键词
Data-centric artificial intelligence (AI); machine learning (ML) pipelines; reliable-ML;
D O I
10.1109/TAI.2023.3345805
中图分类号
学科分类号
摘要
Data-centric artificial intelligence (AI) is an emerging paradigm that emphasizes the critical role of data in real-world machine learning (ML) systems - as a complement to model development. However, data-centric AI is still in its infancy, lacking a standardized framework that outlines necessary data-centric considerations at various stages of the ML pipeline: Data, Training, Testing, and Deployment. This lack of guidance hampers effective communication and design of data-centric driven ML systems. To address this critical gap, we introduce the Data-Centric Checklist (DC-Check), an actionable checklist-style framework that encapsulates data-centric considerations for ML systems. DC-Check is aimed at both practitioners and researchers to serve as a reference guide to data-centric AI development. Around each question in DC-Check, we discuss the applicability of different approaches, survey the state of the art, and highlight specific data-centric AI challenges and research opportunities. While developing DC-Check, we also undertook an analysis of the current data-centric AI landscape. The insights obtained from this exploration support the DC-Check framework, reinforcing its utility and relevance in the rapidly evolving field. To make DC-Check and related resources easily accessible, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/), which will serve as a living resource, updated as methods and tools evolve. © 2020 IEEE.
引用
收藏
页码:2589 / 2603
页数:14
相关论文
共 50 条
  • [1] Data-Centric Artificial Intelligence
    Jakubik, Johannes
    Voessing, Michael
    Kuehl, Niklas
    Walk, Jannis
    Satzger, Gerhard
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2024, 66 (04) : 507 - 515
  • [2] Opportunities and Challenges in Data-Centric AI
    Kumar, Sushant
    Datta, Sumit
    Singh, Vishakha
    Singh, Sanjay Kumar
    Sharma, Ritesh
    IEEE ACCESS, 2024, 12 (33173-33189) : 33173 - 33189
  • [3] Data-centric Artificial Intelligence: A Survey
    Zha, Daochen
    Bhat, Zaid Pervaiz
    Lai, Kwei-Herng
    Yang, Fan
    Jiang, Zhimeng
    Zhong, Shaochen
    Hu, Xia
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [4] Data-centric challenges with the application and adoption of artificial intelligence for drug discovery
    Ghislat, Ghita
    Hernandez-Hernandez, Saiveth
    Piwajanusorn, Chayanit
    Ballester, Pedro J.
    EXPERT OPINION ON DRUG DISCOVERY, 2024, 19 (11) : 1297 - 1307
  • [5] Data-Centric Green Artificial Intelligence: A Survey
    Salehi S.
    Schmeink A.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (05): : 1973 - 1989
  • [6] Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2023, 25 (06) : 62 - 70
  • [7] Data-Centric Artificial Intelligence, Preprocessing, and the Quest for Transformative Artificial Intelligence Systems Development
    Majeed, Abdul
    Hwang, Seong Oun
    COMPUTER, 2023, 56 (05) : 109 - 115
  • [8] Challenges and Opportunities for Data-Centric Peer Evaluation Tools for Teamwork
    Shi W.W.
    Jagannadharao A.
    Lee J.
    Bailey B.P.
    Proceedings of the ACM on Human-Computer Interaction, 2021, 5 (CSCW2)
  • [9] Uncovering Archaeological Sites in Airborne LiDAR Data With Data-Centric Artificial Intelligence
    Canedo, Daniel
    Fonte, Joao
    Seco, Luis Goncalves
    Vazquez, Marta
    Dias, Rita
    Do Pereiro, Tiago
    Hipolito, Joao
    Menendez-Marsh, Fernando
    Georgieva, Petia
    Neves, Antonio J. R.
    IEEE ACCESS, 2023, 11 : 65608 - 65619
  • [10] Systematic review of data-centric approaches in artificial intelligence and machine learning
    Singh P.
    Data Science and Management, 2023, 6 (03): : 144 - 157