Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities

被引:3
|
作者
Seedat N. [2 ]
Imrie F. [1 ]
Van Der Schaar M. [2 ,3 ]
机构
[1] University of California, Los Angeles
[2] University of Cambridge, Cambridge
[3] Alan Turing Institute, London
来源
关键词
Data-centric artificial intelligence (AI); machine learning (ML) pipelines; reliable-ML;
D O I
10.1109/TAI.2023.3345805
中图分类号
学科分类号
摘要
Data-centric artificial intelligence (AI) is an emerging paradigm that emphasizes the critical role of data in real-world machine learning (ML) systems - as a complement to model development. However, data-centric AI is still in its infancy, lacking a standardized framework that outlines necessary data-centric considerations at various stages of the ML pipeline: Data, Training, Testing, and Deployment. This lack of guidance hampers effective communication and design of data-centric driven ML systems. To address this critical gap, we introduce the Data-Centric Checklist (DC-Check), an actionable checklist-style framework that encapsulates data-centric considerations for ML systems. DC-Check is aimed at both practitioners and researchers to serve as a reference guide to data-centric AI development. Around each question in DC-Check, we discuss the applicability of different approaches, survey the state of the art, and highlight specific data-centric AI challenges and research opportunities. While developing DC-Check, we also undertook an analysis of the current data-centric AI landscape. The insights obtained from this exploration support the DC-Check framework, reinforcing its utility and relevance in the rapidly evolving field. To make DC-Check and related resources easily accessible, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/), which will serve as a living resource, updated as methods and tools evolve. © 2020 IEEE.
引用
收藏
页码:2589 / 2603
页数:14
相关论文
共 50 条
  • [41] Artificial Intelligence for Remote Sensing Data Analysis A Review of Challenges and Opportunities
    Zhang, Lefei
    Zhang, Liangpei
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2022, 10 (02) : 270 - 294
  • [42] The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare
    Aung, Yuri Y. M.
    Wong, David C. S.
    Ting, Daniel S. W.
    BRITISH MEDICAL BULLETIN, 2021, 139 (01) : 4 - 15
  • [43] Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities
    Gangwal, Amit
    Ansari, Azim
    Ahmad, Iqrar
    Azad, Abul Kalam
    Kumarasamy, Vinoth
    Subramaniyan, Vetriselvan
    Wong, Ling Shing
    FRONTIERS IN PHARMACOLOGY, 2024, 15
  • [44] Data-centric automated approach to predict autism spectrum disorder based on selective features and explainable artificial intelligence
    Aldrees, Asma
    Ojo, Stephen
    Wanliss, James
    Umer, Muhammad
    Khan, Muhammad Attique
    Alabdullah, Bayan
    Alsubai, Shtwai
    Innab, Nisreen
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
  • [45] From Big Data to Big Artificial Intelligence?Algorithmic Challenges and Opportunities of Big Data
    Kristian Kersting
    Ulrich Meyer
    KI - Künstliche Intelligenz, 2018, 32 (1) : 3 - 8
  • [46] From Big Data to Big Artificial Intelligence? Algorithmic Challenges and Opportunities of Big Data
    Kersting, Kristian
    Meyer, Ulrich
    KUNSTLICHE INTELLIGENZ, 2018, 32 (01): : 3 - 8
  • [47] Data collection and quality challenges in deep learning: a data-centric AI perspective
    Steven Euijong Whang
    Yuji Roh
    Hwanjun Song
    Jae-Gil Lee
    The VLDB Journal, 2023, 32 : 791 - 813
  • [48] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    Adeoye, John
    Hui, Liuling
    Su, Yu-Xiong
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [49] Towards Data-centric Decision Making for Smart Infrastructure: Data and Its Challenges
    Droo, Didem Gurdur
    Schooling, Jennifer
    IFAC PAPERSONLINE, 2020, 53 (03): : 90 - 94
  • [50] Data collection and quality challenges in deep learning: a data-centric AI perspective
    Whang, Steven Euijong
    Roh, Yuji
    Song, Hwanjun
    Lee, Jae-Gil
    VLDB JOURNAL, 2023, 32 (04): : 791 - 813