<sc>PragFormer</sc>: Data-Driven Parallel Source Code Classification with Transformers

被引:0
|
作者
Harel, Re'em [1 ,2 ]
Kadosh, Tal [1 ,3 ]
Hasabnis, Niranjan [4 ]
Mattson, Timothy [4 ]
Pinter, Yuval [1 ]
Oren, Gal [2 ,5 ]
机构
[1] Bengurion Univ, Beer Sheva, Israel
[2] NRCN, Beer Sheva, Israel
[3] IAEC, Tel Aviv, Israel
[4] Intel Labs, Hillsboro, OR USA
[5] Technion, Haifa, Israel
关键词
Parallel programming; Artificial intelligence; Software development; Programming assistance;
D O I
10.1007/s10766-024-00778-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public. Moreover, an interactive demo of our tool, as well as a Hugging Face webpage to experiment with our tool, are already available.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Is Open Source the Future of AI? A Data-Driven Approach
    Vake, Domen
    Sinik, Bogdan
    Vicic, Jernej
    Tosic, Aleksandar
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [42] Analysis of data-driven approaches for radar target classification
    Coskun, Aysu
    Bilicz, Sandor
    COMPEL-THE INTERNATIONAL JOURNAL FOR COMPUTATION AND MATHEMATICS IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2024, 43 (03) : 507 - 518
  • [43] Ontological representation, classification and data-driven computing of phenotypes
    Alexandr Uciteli
    Christoph Beger
    Toralf Kirsten
    Frank A. Meineke
    Heinrich Herre
    Journal of Biomedical Semantics, 11
  • [44] Data-Driven Diagnostics of Mechanism and Source of Sustained Oscillations
    Wang, Xiaozhe
    Turitsyn, Konstantin
    2016 IEEE POWER AND ENERGY SOCIETY GENERAL MEETING (PESGM), 2016,
  • [45] Data-Driven Diagnostics of Mechanism and Source of Sustained Oscillations
    Wang, Xiaozhe
    Turitsyn, Konstantin
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2016, 31 (05) : 4036 - 4046
  • [46] Cell ontology in an age of data-driven cell classification
    Osumi-Sutherland, David
    BMC BIOINFORMATICS, 2017, 18
  • [47] Data-driven decomposition for multi-class classification
    Zhou, Jie
    Peng, Hanchuan
    Suen, Ching Y.
    PATTERN RECOGNITION, 2008, 41 (01) : 67 - 76
  • [48] Data-driven catchment classification: application to the pub problem
    Di Prinzio, M.
    Castellarin, A.
    Toth, E.
    HYDROLOGY AND EARTH SYSTEM SCIENCES, 2011, 15 (06) : 1921 - 1935
  • [49] A Data-driven Classification Framework for Conflict and Instability Analysis
    Choi, Kihoon
    Pattipati, Krishna R.
    Asal, Victor
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 114 - +
  • [50] Data-Driven Deep Supervision for Skin Lesion Classification
    Mishra, Suraj
    Zhang, Yizhe
    Zhang, Li
    Zhang, Tianyu
    Hu, X. Sharon
    Chen, Danny Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT I, 2022, 13431 : 721 - 731