AI as a Sport: On the Competitive Epistemologies of Benchmarking

被引:2
|
作者
Orr, Will [1 ]
Kang, Edward B. [2 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] NYU, New York, NY USA
关键词
Machine learning benchmarks; Machine learning competitions; History of benchmarking; Benchmarking for generative AI; Benchmark datasets;
D O I
10.1145/3630106.3659012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial Intelligence (AI) systems are evaluated using competitive methods that rely on benchmark datasets to determine performance. These benchmark datasets, however, are often constructed through arbitrary processes that fall short in encapsulating the depth and breadth of the tasks they are intended to measure. In this paper, we interrogate the naturalization of benchmark datasets as veracious metrics by examining the historical development of benchmarking as an epistemic practice in AI research. Specifically, we highlight three key case studies that were crucial in establishing the existing reliance on benchmark datasets for evaluating the capabilities of AI systems: (1) the sharing of Highleyman's OCR dataset in the 1960s, which solidified a community of knowledge production around a shared benchmark dataset, (2) the Common Task Framework (CTF) of the 1980s, a state-led project to standardize benchmark datasets as legitimate indicators of technical progress; and (3) the Netflix Prize which further solidified benchmarking as a competitive goal within the ML research community. This genealogy highlights how contemporary dynamics and limitations of benchmarking developed from a longer history of collaboration, standardization, and competition. We end with reflections on how this history informs our understanding of benchmarking in the current era of generative artificial intelligence.
引用
收藏
页码:1875 / 1884
页数:10
相关论文
共 50 条
  • [31] Benchmarking marketing capabilities for sustainable competitive advantage
    Vorhies, DW
    Morgan, NA
    JOURNAL OF MARKETING, 2005, 69 (01) : 80 - 94
  • [32] IT-driven quality benchmarking for competitive advantage
    Srividya, A.
    Metri, Bhimaraya A.
    IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India), 2001, 18 (01): : 17 - 21
  • [33] Reference theory: strategic groups and competitive benchmarking
    Panagiotou, George
    MANAGEMENT DECISION, 2007, 45 (10) : 1595 - 1621
  • [34] Competitive Benchmarking: Improving the Performance of a Transport Company
    Ionescu, Adriana-Mihaela
    Lie, Ioana-Ruxandra
    EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT: A 2025 VISION TO SUSTAIN ECONOMIC DEVELOPMENT DURING GLOBAL CHALLENGES, 2020, : 16242 - 16252
  • [35] AIMING HIGH - COMPETITIVE BENCHMARKING FOR SUPERIOR PERFORMANCE
    SHETTY, YK
    LONG RANGE PLANNING, 1993, 26 (01) : 39 - 44
  • [36] COMPETITIVE BENCHMARKING - PROGRESS AND FUTURE-DEVELOPMENT
    PICKERING, IM
    CHAMBERS, S
    COMPUTER INTEGRATED MANUFACTURING SYSTEMS, 1991, 4 (02): : 98 - 102
  • [37] COMPETITIVE BENCHMARKING - AN EXECUTIVE GUIDE - ZAIRI,M
    RICHARDS, A
    R & D MANAGEMENT, 1993, 23 (02) : 184 - 184
  • [38] Competitive Benchmarking with four companies in the pork industry
    Betancourt-Guerrero, Benjamin
    Franco-Ricaurte, Angelica M.
    INGENIERIA Y COMPETITIVIDAD, 2018, 20 (02): : 87 - 98
  • [39] COMPETITIVE BENCHMARKING - LARGE GAINS FOR SMALL COMPANIES
    MICKLEWRIGHT, MJ
    QUALITY PROGRESS, 1993, 26 (06) : 67 - 68
  • [40] Benchmarking AI Inference: Where we are in 2020
    Hodak, Miro
    Ellison, David
    Dholakia, Ajay
    PERFORMANCE EVALUATION AND BENCHMARKING (TPCTC 2020), 2021, 12752 : 93 - 102