Data-centric AI: Techniques and Future Perspectives

被引:9
|
作者
Zha, Daochen [1 ]
Lai, Kwei-Herng [2 ]
Yang, Fan [3 ]
Zou, Na [4 ]
Gao, Huiji [1 ]
Hu, Xia [2 ]
机构
[1] Airbnb Inc, San Francisco, CA 94103 USA
[2] Rice Univ, Houston, TX USA
[3] Wake Forest Univ, Winston Salem, NC USA
[4] Texas A&M Univ, College Stn, TX USA
关键词
D O I
10.1145/3580305.3599553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The role of data in AI has been significantly magnified by the emerging concept of data-centric AI. In contrast to the traditional model-centric paradigm, which focuses on developing more effective models given fixed datasets, data-centric AI emphasizes the systematic engineering of data in building AI systems. However, as a new concept, many critical aspects of data-centric AI remain ambiguous, such as its definitions, associated tasks, algorithms, challenges, and benchmarks. This tutorial aims to review and discuss this emerging field, with a particular focus on the three general data-centric AI goals: training data development, inference data development, and data maintenance. The objective of this tutorial is threefold: (1) to formally categorize the field of data-centric AI using a goal-driven taxonomy and discuss the needs and challenges of each goal, (2) to comprehensively review the state-of-the-art techniques, and (3) to discuss the future perspectives and open research directions to inspire further innovations in this field.
引用
收藏
页码:5839 / 5840
页数:2
相关论文
共 50 条
  • [1] Data-centric AI: Perspectives and Challenges
    Zha, Daochen
    Bhat, Zaid Pervaiz
    Lai, Kwei-Herng
    Yang, Fan
    Hu, Xia
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 945 - 948
  • [2] Data-Centric AI
    Malerba, Donato
    Pasquadibisceglie, Vincenzo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (06) : 1493 - 1502
  • [3] The Principles of Data-Centric AI
    Jarrahi, Mohammad Hossein
    Memariani, Ali
    Guha, Shion
    COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 84 - 92
  • [4] Opportunities and Challenges in Data-Centric AI
    Kumar, Sushant
    Datta, Sumit
    Singh, Vishakha
    Singh, Sanjay Kumar
    Sharma, Ritesh
    IEEE ACCESS, 2024, 12 (33173-33189) : 33173 - 33189
  • [5] dcbench: A Benchmark for Data-Centric AI Systems
    Eyuboglu, Sabri
    Karlas, Bojan
    Re, Christopher
    Zhang, Ce
    Zou, James
    PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,
  • [6] Potential Impact of Data-Centric AI on Society
    Kumar, Sushant
    Sharma, Ritesh
    Singh, Vishakha
    Tiwari, Shrikant
    Singh, Sanjay Kumar
    Datta, Sumit
    IEEE TECHNOLOGY AND SOCIETY MAGAZINE, 2023, 42 (03) : 98 - 107
  • [7] Data-Centric AI for Healthcare Fraud Detection
    Johnson J.M.
    Khoshgoftaar T.M.
    SN Computer Science, 4 (4)
  • [8] DataPerf: Benchmarks for Data-Centric AI Development
    Mazumder, Mark
    Banbury, Colby
    Yao, Xiaozhe
    Karlas, Bojan
    Rojas, William Gaviria
    Diamos, Sudnya
    Diamos, Greg
    He, Lynn
    Parrish, Alicia
    Kirk, Hannah Rose
    Quaye, Jessica
    Rastogi, Charvi
    Kiela, Douwe
    Jurado, David
    Kanter, David
    Mosquera, Rafael
    Ciro, Juan
    Aroyo, Lora
    Acun, Bilge
    Chen, Lingjiao
    Raje, Mehul Smriti
    Bartolo, Max
    Eyuboglu, Sabri
    Ghorbani, Amirata
    Goodman, Emmett
    Inel, Oana
    Kane, Tariq
    Kirkpatrick, Christine R.
    Kuo, Tzu-Sheng
    Mueller, Jonas
    Thrush, Tristan
    Vanschoren, Joaquin
    Warren, Margaret
    Williams, Adina
    Yeung, Serena
    Ardalani, Newsha
    Paritosh, Praveen
    Zhang, Ce
    Zou, James
    Wu, Carole-Jean
    Coleman, Cody
    Ng, Andrew
    Mattson, Peter
    Reddi, Vijay Janapa
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] A data-centric approach for ethical and trustworthy AI in journalism
    Dierickx, Laurence
    Opdahl, Andreas Lothe
    Khan, Sohail Ahmed
    Linden, Carl-Gustav
    Guerrero Rojas, Diana Carolina
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (04)
  • [10] Data-centric AI approach for automated wildflower monitoring
    Schouten, Gerard
    Michielsen, Bas S. H. T.
    Gravendeel, Barbara
    PLOS ONE, 2024, 19 (09):