AI as a Sport: On the Competitive Epistemologies of Benchmarking

被引:2
|
作者
Orr, Will [1 ]
Kang, Edward B. [2 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] NYU, New York, NY USA
关键词
Machine learning benchmarks; Machine learning competitions; History of benchmarking; Benchmarking for generative AI; Benchmark datasets;
D O I
10.1145/3630106.3659012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial Intelligence (AI) systems are evaluated using competitive methods that rely on benchmark datasets to determine performance. These benchmark datasets, however, are often constructed through arbitrary processes that fall short in encapsulating the depth and breadth of the tasks they are intended to measure. In this paper, we interrogate the naturalization of benchmark datasets as veracious metrics by examining the historical development of benchmarking as an epistemic practice in AI research. Specifically, we highlight three key case studies that were crucial in establishing the existing reliance on benchmark datasets for evaluating the capabilities of AI systems: (1) the sharing of Highleyman's OCR dataset in the 1960s, which solidified a community of knowledge production around a shared benchmark dataset, (2) the Common Task Framework (CTF) of the 1980s, a state-led project to standardize benchmark datasets as legitimate indicators of technical progress; and (3) the Netflix Prize which further solidified benchmarking as a competitive goal within the ML research community. This genealogy highlights how contemporary dynamics and limitations of benchmarking developed from a longer history of collaboration, standardization, and competition. We end with reflections on how this history informs our understanding of benchmarking in the current era of generative artificial intelligence.
引用
收藏
页码:1875 / 1884
页数:10
相关论文
共 50 条
  • [21] Sport in the context of competitive economy
    Pehoiu, Constantin
    Puscoci, Sica
    RECENT ADVANCES IN BUSINESS ADMINISTRATION, 2010, : 169 - 174
  • [22] Diffusion of smoking in competitive sport
    Di Cave, P.
    Appodia, M.
    Todaro, A.
    MEDICINA DELLO SPORT, 2014, 67 (04) : 593 - 601
  • [23] COMPETITIVE SPORT, WINNING AND EDUCATION
    ARNOLD, PJ
    JOURNAL OF MORAL EDUCATION, 1989, 18 (01) : 15 - 25
  • [24] A competitive index for international sport
    Mitchell, Heather
    Stewart, Mark F.
    APPLIED ECONOMICS, 2007, 39 (4-6) : 587 - 603
  • [25] Reasonable Accommodation in Competitive Sport
    Petersen, Jeffrey C.
    Ivan, Emese
    JOURNAL OF PHYSICAL EDUCATION RECREATION AND DANCE, 2007, 78 (05): : 9 - 10
  • [26] What is the most competitive sport?
    Ben-Naim, E.
    Vazquez, F.
    Redner, S.
    JOURNAL OF THE KOREAN PHYSICAL SOCIETY, 2007, 50 (01) : 124 - 126
  • [27] AI Sport Forecast Software
    Takahashi, Kiyomi Cerezo
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2008, 1 (01): : 60 - 65
  • [28] The profession of sport psychologist in competitive sport from the perspective of practitioners
    Ehrlenspiel, Felix
    Droste, Anna
    Beckmann, Juergen
    ZEITSCHRIFT FUR SPORTPSYCHOLOGIE, 2011, 18 (02): : 73 - 86
  • [29] IT-driven quality benchmarking for competitive advantage
    Srividya, A
    Metri, BA
    IETE TECHNICAL REVIEW, 2001, 18 (01): : 17 - 21
  • [30] BENCHMARKING - PERFORMANCE IMPROVEMENT TOWARD COMPETITIVE ADVANTAGE
    LEMA, NM
    PRICE, ADF
    JOURNAL OF MANAGEMENT IN ENGINEERING, 1995, 11 (01) : 28 - 37