MSR4ML: Reconstructing Artifact Traceability in Machine Learning Repositories

被引:6
|
作者
Njomou, Aquilas Tchanjou [1 ]
Africa, Alexandra Johanne Bifona [1 ]
Adams, Bram [1 ]
Fokaefs, Marios [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ, Canada
关键词
Model Traceability; Machine Learning Operations; Mining Software Repositories; Model Mining; Metadata Extraction; Developer Productivity;
D O I
10.1109/SANER50967.2021.00061
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The increasing popularity of Machine Learning (ML) is generating challenges also for developers. The multitude of programming languages, libraries and available resources allow them to easily build their own models or algorithms. However, ML models are tightly connected to their data implying a different development process from other types of software. Software projects often rely on version control platforms, such as GitHub, but these platforms have not yet been extended to support ML projects. There is poor support for data versioning and no link between ML and software artifacts. Thus, traceability and model evolution can become challenging for developers. While some specific ML platforms exist, they still require considerable manual specification of ML artifacts and links between them. In this work, we propose a framework for automatic identification and traceability of links between data, code and ML model through Mining Software Repositories (MSR) techniques. Our tool combines static code analysis and mining commit data to identify ML, code and data artifacts, reconstruct links between them and retrieve commits that affect each end of the link. The objective is to increase productivity and the developers' awareness of their project through the recovered traceability.
引用
收藏
页码:536 / 540
页数:5
相关论文
共 30 条
  • [21] ML4FF: A machine-learning framework for flash flood forecasting applied to a Brazilian watershed
    Soares, Jaqueline A. J. P.
    Ozelim, Luan C. S. M.
    Bacelar, Luiz
    Ribeiro, Dimas B.
    Stephany, Stephan
    Santos, Leonardo B. L.
    JOURNAL OF HYDROLOGY, 2025, 652
  • [22] Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action
    D'Elia, Domenica
    Truu, Jaak
    Lahti, Leo
    Berland, Magali
    Papoutsoglou, Georgios
    Ceci, Michelangelo
    Zomer, Aldert
    Lopes, Marta B.
    Ibrahimi, Eliana
    Gruca, Aleksandra
    Nechyporenko, Alina
    Frohme, Marcus
    Klammsteiner, Thomas
    Pau, Enrique Carrillo-de Santa
    Marcos-Zambrano, Laura Judith
    Hron, Karel
    Pio, Gianvito
    Simeon, Andrea
    Suharoschi, Ramona
    Moreno-Indias, Isabel
    Temko, Andriy
    Nedyalkova, Miroslava
    Apostol, Elena-Simona
    Truica, Ciprian-Octavian
    Shigdel, Rajesh
    Telalovic, Jasminka Hasic
    Bongcam-Rudloff, Erik
    Przymus, Piotr
    Jordamovic, Naida Babic
    Falquet, Laurent
    Tarazona, Sonia
    Sampri, Alexia
    Isola, Gaetano
    Perez-Serrano, David
    Trajkovik, Vladimir
    Klucar, Lubos
    Loncar-Turukalo, Tatjana
    Havulinna, Aki S.
    Jansen, Christian
    Bertelsen, Randi J.
    Claesson, Marcus Joakim
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [23] ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning
    Giri, Davide
    Chiu, Kuan-Lin
    Di Guglielmo, Giuseppe
    Mantovani, Paolo
    Carloni, Luca P.
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1049 - 1054
  • [24] ML4STEM Professional Development Program: Enriching K-12 STEM Teaching with Machine Learning
    Jingwan Tang
    Xiaofei Zhou
    Xiaoyu Wan
    Michael Daley
    Zhen Bai
    International Journal of Artificial Intelligence in Education, 2023, 33 : 185 - 224
  • [25] ML4STEM Professional Development Program: Enriching K-12 STEM Teaching with Machine Learning
    Tang, Jingwan
    Zhou, Xiaofei
    Wan, Xiaoyu
    Daley, Michael
    Bai, Zhen
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2023, 33 (01) : 185 - 224
  • [26] VPAgs-Dataset4ML: A Dataset to Predict Viral Protective Antigens for Machine Learning-Based Reverse Vaccinology
    Salod, Zakia
    Mahomed, Ozayr
    DATA, 2023, 8 (02)
  • [27] ML-J-DP4: An Integrated Quantum Mechanics-Machine Learning Approach for Ultrafast NMR Structural Elucidation
    Tsai, Yi-Hsuan
    Amichetti, Milagros
    Zanardi, Maria Marta
    Grimson, Rafael
    Daranas, Antonio Hernandez
    Sarotti, Ariel M.
    ORGANIC LETTERS, 2022, 24 (41) : 7487 - 7491
  • [28] ML for IEEE 802.15. 4e/TSCH: Energy Efficient Approach to Detect DDoS Attack Using Machine Learning
    Bhale, Pradeepkumar
    Biswas, Santosh
    Nandi, Sukumar
    2021 International Wireless Communications and Mobile Computing, IWCMC 2021, 2021, : 1477 - 1482
  • [29] Machine learning for requirements engineering (ML4RE): A systematic literature review complemented by practitioners' voices from Stack Overflow
    Li, Tong
    Zhang, Xinran
    Wang, Yunduo
    Zhou, Qixiang
    Wang, Yiting
    Dong, Fangqi
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 172
  • [30] ML for IEEE 802.15. 4e/TSCH: Energy Efficient Approach to Detect DDoS Attack Using Machine Learning
    Bhale, Pradeepkumar
    Biswas, Santosh
    Nandi, Sukumar
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 1477 - 1482