MSR4ML: Reconstructing Artifact Traceability in Machine Learning Repositories

被引:6
|
作者
Njomou, Aquilas Tchanjou [1 ]
Africa, Alexandra Johanne Bifona [1 ]
Adams, Bram [1 ]
Fokaefs, Marios [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ, Canada
关键词
Model Traceability; Machine Learning Operations; Mining Software Repositories; Model Mining; Metadata Extraction; Developer Productivity;
D O I
10.1109/SANER50967.2021.00061
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The increasing popularity of Machine Learning (ML) is generating challenges also for developers. The multitude of programming languages, libraries and available resources allow them to easily build their own models or algorithms. However, ML models are tightly connected to their data implying a different development process from other types of software. Software projects often rely on version control platforms, such as GitHub, but these platforms have not yet been extended to support ML projects. There is poor support for data versioning and no link between ML and software artifacts. Thus, traceability and model evolution can become challenging for developers. While some specific ML platforms exist, they still require considerable manual specification of ML artifacts and links between them. In this work, we propose a framework for automatic identification and traceability of links between data, code and ML model through Mining Software Repositories (MSR) techniques. Our tool combines static code analysis and mining commit data to identify ML, code and data artifacts, reconstruct links between them and retrieve commits that affect each end of the link. The objective is to increase productivity and the developers' awareness of their project through the recovered traceability.
引用
收藏
页码:536 / 540
页数:5
相关论文
共 30 条
  • [1] ML4ML: Automated Invariance Testing for Machine Learning Models
    Liao, Zukang
    Zhang, Pengfei
    Chen, Min
    2022 FOURTH IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST 2022), 2022, : 34 - 41
  • [2] Machine Learning for Health (ML4H) 2021
    Roy, Subhrajit
    Pfohl, Stephen
    Tadesse, Girmaw Abebe
    Oala, Luis
    Falck, Fabian
    Zhou, Yuyin
    Shen, Liyue
    Zamzmi, Ghada
    Mugambi, Purity
    Zirikly, Ayah
    McDermott, Matthew B.A.
    Alsentzer, Emily
    Proceedings of Machine Learning Research, 2021, 158 : 1 - 12
  • [3] Machine Learning for Health (ML4H) 2022
    Parziale, Antonio
    Agrawal, Monica
    Tang, Shengpu
    Severson, Kristen
    Oala, Luis
    Subbaswamy, Adarsh
    Kumar, Sayantan
    Schoerverth, Elora
    Hegselmann, Stefan
    Zhou, Helen
    Zamzmi, Ghada
    Mugambi, Purity
    Sizikova, Elena
    Tadesse, Girmaw Abebe
    Zhou, Yuyin
    Killian, Taylor
    Zhang, Haoran
    Kamran, Fahad
    Hobby, Andrea
    Huang, Mars
    Alaa, Ahmed
    Singh, Harvineet
    Chen, Irene Y.
    Joshi, Shalmali
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 1 - 11
  • [4] Machine Learning for Health (ML4H) 2021
    Roy, Subhrajit
    Pfohl, Stephen
    Tadesse, Girmaw Abebe
    Oala, Luis
    Falck, Fabian
    Zhou, Yuyin
    Shen, Liyue
    Zamzmi, Ghada
    Mugambi, Purity
    Zirikly, Ayah
    McDermott, Matthew B. A.
    Alsentzer, Emily
    MACHINE LEARNING FOR HEALTH, VOL 158, 2021, 158 : 1 - 12
  • [5] Machine Learning for Health (ML4H) 2023
    Hegselmann, Stefan
    Parziale, Antonio
    Shanmugam, Divya
    Tang, Shengpu
    Severson, Kristen
    Asiedu, Mercy Nyamewaa
    Chang, Serina
    Dossou, Bonaventure F. P.
    Huang, Qian
    Kamran, Fahad
    Zhang, Haoran
    Nagaraj, Sujay
    Oala, Luis
    Xu, Shan
    Okolo, Chinasa T.
    Zhou, Helen
    Dafflon, Jessica
    Ellington, Caleb
    Jabbour, Sarah
    Jeong, Hyewon
    Nieva, Harry Reyes
    Yang, Yuzhe
    Zamzmi, Ghada
    Mhasawade, Vishwali
    Truong, Van
    Chandak, Payal
    Lee, Matthew
    Argaw, Peniel
    Heuton, Kyle
    Singh, Harvineet
    Hartvigsen, Thomas
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 1 - 12
  • [6] Machine Learning for Health (ML4H) 2019: What Makes Machine Learning in Medicine Different?
    Dalca, Adrian V.
    Mcdermott, Matthew
    Alsentzer, Emily
    Finlayson, Sam
    Oberst, Michael
    Falck, Fabian
    Chivers, Corey
    Beam, Andrew L.
    Naumann, Tristan
    Beaulieu-Jones, Brett
    MACHINE LEARNING FOR HEALTH WORKSHOP, VOL 116, 2019, 116 : 1 - 9
  • [7] QoA4ML-A Framework for Supporting Contracts in Machine Learning Services
    Truong, Hong-Linh
    Nguyen, Tri-Minh
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 465 - 475
  • [8] Demonstration Paper: Monitoring Machine Learning Contracts with QoA4ML
    Minh-Tri Nguyen
    Hong-Linh Truong
    COMPANION OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE 2021, 2021, : 169 - 170
  • [9] VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning
    Sacha, Dominik
    Kraus, Matthias
    Keim, Daniel A.
    Chen, Min
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) : 385 - 395
  • [10] MLP4ML: Machine Learning Service Recommendation System using MLP
    Alghofaily, Bayan
    Ding, Chen
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2020), 2020, : 84 - 91