<sc>PragFormer</sc>: Data-Driven Parallel Source Code Classification with Transformers

被引:0
|
作者
Harel, Re'em [1 ,2 ]
Kadosh, Tal [1 ,3 ]
Hasabnis, Niranjan [4 ]
Mattson, Timothy [4 ]
Pinter, Yuval [1 ]
Oren, Gal [2 ,5 ]
机构
[1] Bengurion Univ, Beer Sheva, Israel
[2] NRCN, Beer Sheva, Israel
[3] IAEC, Tel Aviv, Israel
[4] Intel Labs, Hillsboro, OR USA
[5] Technion, Haifa, Israel
关键词
Parallel programming; Artificial intelligence; Software development; Programming assistance;
D O I
10.1007/s10766-024-00778-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public. Moreover, an interactive demo of our tool, as well as a Hugging Face webpage to experiment with our tool, are already available.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] D2SC: Data-Driven Smarter Cities
    Chen, Chao
    Zhang, Daqing
    Guo, Bin
    2014 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2014, : 599 - 603
  • [2] NSF/CHE: Data-driven discovery in chemistry (D3SC)
    He, Lin
    Atlas, Susan
    Cave, Robert
    Rockcliffe, David
    Wilson, Angela
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
  • [3] Enabling Automatic Repair of Source Code Vulnerabilities Using Data-Driven Methods
    Grishina, Anastasiia
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 275 - 277
  • [4] A data-driven classification of feelings
    Thomson, David M. H.
    Crocker, Christopher
    FOOD QUALITY AND PREFERENCE, 2013, 27 (02) : 137 - 152
  • [5] <sc>pbins</sc>: private bins for top-k semantic search over encrypted data using transformers
    Arockiasamy, John Prakash
    Sabarimuthu, Irene
    Benjamin, Lydia Elizabeth
    Palaniswami, Srinivasan
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2025, 24 (01)
  • [6] A Data-Driven Fault Prediction Method for Power Transformers
    Chen, Zhuo
    Chen, Junxingxu
    Qiao, Hong
    Xu, Xianyong
    Xiao, Jian
    Long, Yanbo
    2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 145 - 149
  • [7] DATA-DRIVEN PARALLEL PRODUCTION SYSTEMS
    GAUDIOT, JL
    SOHN, A
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (03) : 281 - 293
  • [8] Data-Driven Thermal Modeling of Residential Service Transformers
    Seier, Andrew
    Hines, Paul D. H.
    Frolik, Jeff
    IEEE TRANSACTIONS ON SMART GRID, 2015, 6 (02) : 1019 - 1025
  • [9] CODE RED The Danger of Data-Driven Instruction
    Neuman, Susan B.
    EDUCATIONAL LEADERSHIP, 2016, 74 (03) : 24 - 29
  • [10] A logical approach to data-driven classification
    Osswald, R
    Petersen, W
    KI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2821 : 267 - 281