SYNTHETIC DATA AND THE FUTURE OF AI

被引:0
|
作者
Lee, Peter [1 ,2 ]
机构
[1] Univ Calif Davis, Sch Law, Ctr Innovat Law & Soc, Law, Davis, CA 95616 USA
[2] Univ Calif Davis, Sch Law, Ctr Innovat Law & Soc, Davis, CA 95616 USA
关键词
INTELLECTUAL PROPERTY; TRADE SECRETS; INNOVATION; COPYRIGHT; INDUSTRY; HEALTH; RIGHTS; BIAS; FIRM; LAW;
D O I
暂无
中图分类号
D9 [法律]; DF [法律];
学科分类号
0301 ;
摘要
The future of artificial intelligence (AI) is synthetic. Several of the most prominent technical and legal challenges of AI derivefrom the need to amass huge amounts of real-world data to train machine learning (ML) models. Collecting such real- world data can be highly difficult and can threaten privacy, introduce bias in automated decision making, and infringe copyrights on a massive scale. This Article explores the emergence of a seemingly paradoxical technical creation that can mitigate-though not completely eliminate-these concerns: synthetic data. Increasingly, data scientists are using simulated driving environments, fabricated medical records, fake images, and other forms of synthetic data to train ML models. Artificial data, in other words, is training artificial intelligence. Synthetic data offers a host of technical and legal benefits; it promises to radically decrease the cost of obtaining data, sidestep privacy issues, reduce automated discrimination, and avoid copyright infringement. Alongside such promises, however, synthetic data offers perils as well. Deficiencies in the development and deployment of synthetic data can exacerbate the dangers of AI and cause significant social harm. In light of the enormous value and importance of synthetic data, this Article sketches the contours of an innovation ecosystem to promote its robust and responsible development. It identifies three objectives that should guide legal and policy measures shaping the creation of synthetic data: provisioning, disclosure, and democratization. Ideally, such an ecosystem should incentivize the generation of high-quality synthetic data, encourage disclosure of both synthetic data and processes for generating it, and promote multiple sources of innovation. This Article then examines a suite of "innovation mechanisms" that can advance these objectives, ranging from open source production to proprietary approaches based on patents, trade secrets, and copyrights. Throughout, it suggests policy and doctrinal reforms to enhance innovation, transparency, and democratic access to synthetic data. Just as AI will have enormous implications for law, legal regimes can play a central role in shaping the future of AI.
引用
收藏
页码:1 / 74
页数:74
相关论文
共 50 条
  • [11] Data and AI-driven synthetic binding protein discovery
    Li, Yanlin
    Duan, Zixin
    Li, Zhenwen
    Xue, Weiwei
    TRENDS IN PHARMACOLOGICAL SCIENCES, 2025, 46 (02) : 132 - 144
  • [12] Are Synthetic Data Derivatives the Future of Translational Medicine?
    Foraker, Randi
    Mann, Douglas L.
    Payne, Philip R. O.
    JACC-BASIC TO TRANSLATIONAL SCIENCE, 2018, 3 (05): : 716 - 718
  • [13] Invertible Neural Networks for Trustworthy AI: Intelligent Synthetic Data Generation
    Schwab, Malgorzata
    Biswas, Ashis Kumer
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 101 - 107
  • [14] Risk and the future of AI: Algorithmic bias, data colonialism, and marginalization
    Arora, A.
    Barrett, M.
    Lee, E.
    Oborn, E.
    Prince, K.
    INFORMATION AND ORGANIZATION, 2023, 33 (03)
  • [15] Is Open Source the Future of AI? A Data-Driven Approach
    Vake, Domen
    Sinik, Bogdan
    Vicic, Jernej
    Tosic, Aleksandar
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [16] A Systematic Review of Synthetic Data Generation Techniques Using Generative AI
    Goyal, Mandeep
    Mahmoud, Qusay H.
    ELECTRONICS, 2024, 13 (17)
  • [17] Synthetic Data: How AI Is Transitioning From Data Consumer to Data Producer ... and Why That's Important
    Campbell, Mark
    COMPUTER, 2019, 52 (10) : 89 - 91
  • [18] Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates
    Williams, Alice
    Kovalerchuk, Boris
    2024 28TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION, IV 2024, 2024, : 272 - 279
  • [19] Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark
    Hansen, Lasse
    Seedat, Nabeel
    van der Schaar, Mihaela
    Petrovic, Andrija
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [20] Data integrity and AI: how pharma can build a data-driven future
    Basu, Sayan
    Heumann, Klaus
    European Pharmaceutical Review, 2024, 29 (03):