<sc>PragFormer</sc>: Data-Driven Parallel Source Code Classification with Transformers

被引:0
|
作者
Harel, Re'em [1 ,2 ]
Kadosh, Tal [1 ,3 ]
Hasabnis, Niranjan [4 ]
Mattson, Timothy [4 ]
Pinter, Yuval [1 ]
Oren, Gal [2 ,5 ]
机构
[1] Bengurion Univ, Beer Sheva, Israel
[2] NRCN, Beer Sheva, Israel
[3] IAEC, Tel Aviv, Israel
[4] Intel Labs, Hillsboro, OR USA
[5] Technion, Haifa, Israel
关键词
Parallel programming; Artificial intelligence; Software development; Programming assistance;
D O I
10.1007/s10766-024-00778-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public. Moreover, an interactive demo of our tool, as well as a Hugging Face webpage to experiment with our tool, are already available.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] A Data-Driven Classification Framework for Cybersecurity Breaches
    Rani, Priyanka
    Nag, Abhijit Kumar
    Shahriyar, Rifat
    IT PROFESSIONAL, 2024, 26 (02) : 39 - 48
  • [22] A Data-Driven Method for Congestion Identification and Classification
    Zarindast, Atousa
    Poddar, Subhadipto
    Sharma, Anuj
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2022, 148 (04)
  • [23] Data-Driven Audiogram Classification for Mobile Audiometry
    Charih, Francois
    Bromwich, Matthew
    Mark, Amy E.
    Lefrancois, Renee
    Green, James R.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [24] DATA-DRIVEN VOICE SOURCE WAVEFORM MODELLING
    Thomas, Mark R. R.
    Gudnason, Jon
    Naylor, Patrick A.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3965 - 3968
  • [25] Data-Driven Audiogram Classification for Mobile Audiometry
    François Charih
    Matthew Bromwich
    Amy E. Mark
    Renée Lefrançois
    James R. Green
    Scientific Reports, 10
  • [26] A Data-driven Affective Text Classification Analysis
    Ardakani, Saeid Pourroostaei
    Zhou, Can
    Wu, Xuting
    Ma, Yingrui
    Che, Jizhou
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 199 - 204
  • [27] Data-driven classification of the certainty of scholarly assertions
    Prieto, Mario
    Deus, Helena
    de Waard, Anita
    Schultes, Erik
    Garcia-Jimenez, Beatriz
    Wilkinson, Mark D.
    PEERJ, 2020, 8
  • [28] Data-driven classification using boundary observations
    Zobel, Christopher W.
    Cook, Deborah F.
    Ragsdale, Cliff T.
    DECISION SCIENCES, 2006, 37 (02) : 247 - 262
  • [29] A Data-Driven Framework for Driving Style Classification
    Milardo, Sebastiano
    Rathore, Punit
    Santi, Paolo
    Ratti, Carlo
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 253 - 265
  • [30] Asynchronous data-driven classification of weapon systems
    Jin, Xin
    Mukherjee, Kushal
    Gupta, Shalabh
    Ray, Asok
    Phoha, Shashi
    Damarla, Thyagaraju
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2009, 20 (12)