AI as a Sport: On the Competitive Epistemologies of Benchmarking

被引:2
|
作者
Orr, Will [1 ]
Kang, Edward B. [2 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] NYU, New York, NY USA
关键词
Machine learning benchmarks; Machine learning competitions; History of benchmarking; Benchmarking for generative AI; Benchmark datasets;
D O I
10.1145/3630106.3659012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial Intelligence (AI) systems are evaluated using competitive methods that rely on benchmark datasets to determine performance. These benchmark datasets, however, are often constructed through arbitrary processes that fall short in encapsulating the depth and breadth of the tasks they are intended to measure. In this paper, we interrogate the naturalization of benchmark datasets as veracious metrics by examining the historical development of benchmarking as an epistemic practice in AI research. Specifically, we highlight three key case studies that were crucial in establishing the existing reliance on benchmark datasets for evaluating the capabilities of AI systems: (1) the sharing of Highleyman's OCR dataset in the 1960s, which solidified a community of knowledge production around a shared benchmark dataset, (2) the Common Task Framework (CTF) of the 1980s, a state-led project to standardize benchmark datasets as legitimate indicators of technical progress; and (3) the Netflix Prize which further solidified benchmarking as a competitive goal within the ML research community. This genealogy highlights how contemporary dynamics and limitations of benchmarking developed from a longer history of collaboration, standardization, and competition. We end with reflections on how this history informs our understanding of benchmarking in the current era of generative artificial intelligence.
引用
收藏
页码:1875 / 1884
页数:10
相关论文
共 50 条
  • [41] Ecologically sustainable benchmarking of AI models for histopathology
    Lan, Yu-Chia
    Strauch, Martin
    Pilva, Pourya
    Schmitz, Nikolas E. J.
    Sadr, Alireza Vafaei
    Niggemeier, Leon
    Nguyen, Huong Quynh
    Hoelscher, David L.
    Nguyen, Tri Q.
    Kers, Jesper
    Buelow, Roman D.
    Boor, Peter
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [42] How to Think About Benchmarking Neurosymbolic AI?
    Ott, Johanna
    Ledaguenel, Arthur
    Hudelot, Celine
    Hartwig, Mattis
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
  • [43] Benchmarking Modern Edge Devices for AI Applications
    Kang, Pilsung
    Jo, Jongmin
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (03) : 394 - 403
  • [44] AI for dating stars: a benchmarking study for gyrochronology
    Moya, Andres
    Recio-Martinez, Jarmi
    Lopez-Sastre, Roberto J.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1971 - 1981
  • [45] A systematic study on benchmarking AI inference accelerators
    Jiang, Zihan
    Li, Jiansong
    Liu, Fangxin
    Gao, Wanling
    Wang, Lei
    Lan, Chuanxin
    Tang, Fei
    Liu, Lei
    Li, Tao
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (02) : 87 - 103
  • [46] A systematic study on benchmarking AI inference accelerators
    Zihan Jiang
    Jiansong Li
    Fangxin Liu
    Wanling Gao
    Lei Wang
    Chuanxin Lan
    Fei Tang
    Lei Liu
    Tao Li
    CCF Transactions on High Performance Computing, 2022, 4 : 87 - 103
  • [47] Team Sports for Game AI Benchmarking Revisited
    Mozgovoy, Maxim
    Preuss, Mike
    Bidarra, Rafael
    INTERNATIONAL JOURNAL OF COMPUTER GAMES TECHNOLOGY, 2021, 2021
  • [48] A review of competitive sport motivation research
    Clancy, Rachel B.
    Herring, Matthew P.
    Maclntyre, Tadhg Eoghan
    Campbell, Mark J.
    PSYCHOLOGY OF SPORT AND EXERCISE, 2016, 27 : 232 - 242
  • [49] COMPETITIVE SPORT AND CARDIOVASCULAR-SYSTEM
    PLAS, F
    SCHWEIZERISCHE MEDIZINISCHE WOCHENSCHRIFT, 1974, 104 (44) : 1542 - 1545