PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods

被引：14

作者：

Romano, Joseph D. ^{[1
,2
]}

Le, Trang T. ^{[1
]}

La Cava, William ^{[1
]}

Gregg, John T. ^{[1
]}

Goldberg, Daniel J. ^{[3
]}

Chakraborty, Praneel ^{[4
,5
]}

Ray, Natasha L. ^{[6
]}

Himmelstein, Daniel ^{[7
,8
]}

Fu, Weixuan ^{[1
]}

Moore, Jason H. ^{[1
]}

机构：

[1] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA

[2] Univ Penn, Ctr Excellence Environm Toxicol, Philadelphia, PA 19104 USA

[3] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA

[4] Univ Penn, Sch Arts & Sci, Philadelphia, PA 19104 USA

[5] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA

[6] Princeton Day Sch, Princeton, NJ 08540 USA

[7] Related Sci, Denver, CO 80220 USA

[8] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA

来源：

BIOINFORMATICS | 2022年 / 38卷 / 03期

基金：

美国国家卫生研究院;

关键词：

D O I：

10.1093/bioinformatics/btab727

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB (Penn Machine Learning Benchmarks) provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community.

引用

页码：878 / 880

页数：3

共 50 条

[41] ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design
Krishnan, Srivatsan
Yazdanbaksh, Amir
Prakash, Shvetank
Jabbour, Jason
Uchendu, Ikechukwu
Ghosh, Susobhan
Boroujerdian, Behzad
Richins, Daniel
Tripathy, Devashree
Faust, Aleksandra
Reddi, Vijay Janapa
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 186 - 201
[42] Demystifying the Impact of Open-Source Machine Learning Libraries on Software Analytics
Zhao, Yu
Gong, Yihui
Gong, Lina
Jiang, Shujuan
Huang, Zhiqiu
IEEE TRANSACTIONS ON RELIABILITY, 2024,
[43] RDKit: Open-source cheminformatics from machine learning to chemical registration
Landrum, Gregory
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
[44] kMoL: an open-source machine and federated learning library for drug discovery
Cozac, Romeo
Hasic, Haris
Choong, Jun Jin
Richard, Vincent
Beheshti, Loic
Froehlich, Cyrille
Koyama, Takuto
Matsumoto, Shigeyuki
Kojima, Ryosuke
Iwata, Hiroaki
Hasegawa, Aki
Otsuka, Takao
Okuno, Yasushi
JOURNAL OF CHEMINFORMATICS, 2025, 17 (01):
[45] Comparative analysis of real issues in open-source machine learning projects
Lai, Tuan Dung
Simmons, Anj
Barnett, Scott
Schneider, Jean-Guy
Vasa, Rajesh
EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (03)
[46] B2RL: An open-source Dataset for Building Batch Reinforcement Learning
Liu, Hsin-Yu
Fu, Xiaohan
Balaji, Bharathan
Gupta, Rajesh
Hong, Dezhi
PROCEEDINGS OF THE 2022 THE 9TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2022, 2022, : 462 - 465
[47] MSDM v1.0: A machine learning model for precipitation nowcasting over eastern China using multisource data
Li, Dawei
Liu, Yudi
Chen, Chaohui
GEOSCIENTIFIC MODEL DEVELOPMENT, 2021, 14 (06) : 4019 - 4034
[48] An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications
Toma, Tajkia Rahman
Bezemer, Cor-Paul
PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 64 - 74
[49] Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
Gordienko, Nikita
Gang, Peng
Gordienko, Yuri
Zeng, Wei
Alienin, Oleg
Rokovyi, Oleksandr
Stirenko, Sergii
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 414 - 424
[50] PyMLDA: A Python']Python open-source code for Machine Learning Damage Assessment
Coelho, Jefferson da Silva
Machado, Marcela Rodrigues
de Sousa, Amanda Aryda S. R.
SOFTWARE IMPACTS, 2024, 19

← 1 2 3 4 5 →