PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods

被引:14
|
作者
Romano, Joseph D. [1 ,2 ]
Le, Trang T. [1 ]
La Cava, William [1 ]
Gregg, John T. [1 ]
Goldberg, Daniel J. [3 ]
Chakraborty, Praneel [4 ,5 ]
Ray, Natasha L. [6 ]
Himmelstein, Daniel [7 ,8 ]
Fu, Weixuan [1 ]
Moore, Jason H. [1 ]
机构
[1] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[2] Univ Penn, Ctr Excellence Environm Toxicol, Philadelphia, PA 19104 USA
[3] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[4] Univ Penn, Sch Arts & Sci, Philadelphia, PA 19104 USA
[5] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA
[6] Princeton Day Sch, Princeton, NJ 08540 USA
[7] Related Sci, Denver, CO 80220 USA
[8] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btab727
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB (Penn Machine Learning Benchmarks) provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community.
引用
收藏
页码:878 / 880
页数:3
相关论文
共 50 条
  • [41] ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design
    Krishnan, Srivatsan
    Yazdanbaksh, Amir
    Prakash, Shvetank
    Jabbour, Jason
    Uchendu, Ikechukwu
    Ghosh, Susobhan
    Boroujerdian, Behzad
    Richins, Daniel
    Tripathy, Devashree
    Faust, Aleksandra
    Reddi, Vijay Janapa
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 186 - 201
  • [42] Demystifying the Impact of Open-Source Machine Learning Libraries on Software Analytics
    Zhao, Yu
    Gong, Yihui
    Gong, Lina
    Jiang, Shujuan
    Huang, Zhiqiu
    IEEE TRANSACTIONS ON RELIABILITY, 2024,
  • [43] RDKit: Open-source cheminformatics from machine learning to chemical registration
    Landrum, Gregory
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
  • [44] kMoL: an open-source machine and federated learning library for drug discovery
    Cozac, Romeo
    Hasic, Haris
    Choong, Jun Jin
    Richard, Vincent
    Beheshti, Loic
    Froehlich, Cyrille
    Koyama, Takuto
    Matsumoto, Shigeyuki
    Kojima, Ryosuke
    Iwata, Hiroaki
    Hasegawa, Aki
    Otsuka, Takao
    Okuno, Yasushi
    JOURNAL OF CHEMINFORMATICS, 2025, 17 (01):
  • [45] Comparative analysis of real issues in open-source machine learning projects
    Lai, Tuan Dung
    Simmons, Anj
    Barnett, Scott
    Schneider, Jean-Guy
    Vasa, Rajesh
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (03)
  • [46] B2RL: An open-source Dataset for Building Batch Reinforcement Learning
    Liu, Hsin-Yu
    Fu, Xiaohan
    Balaji, Bharathan
    Gupta, Rajesh
    Hong, Dezhi
    PROCEEDINGS OF THE 2022 THE 9TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2022, 2022, : 462 - 465
  • [47] MSDM v1.0: A machine learning model for precipitation nowcasting over eastern China using multisource data
    Li, Dawei
    Liu, Yudi
    Chen, Chaohui
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2021, 14 (06) : 4019 - 4034
  • [48] An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications
    Toma, Tajkia Rahman
    Bezemer, Cor-Paul
    PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 64 - 74
  • [49] Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
    Gordienko, Nikita
    Gang, Peng
    Gordienko, Yuri
    Zeng, Wei
    Alienin, Oleg
    Rokovyi, Oleksandr
    Stirenko, Sergii
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 414 - 424
  • [50] PyMLDA: A Python']Python open-source code for Machine Learning Damage Assessment
    Coelho, Jefferson da Silva
    Machado, Marcela Rodrigues
    de Sousa, Amanda Aryda S. R.
    SOFTWARE IMPACTS, 2024, 19