A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引:0
|
作者
Fu, Weimin [1 ]
Li, Shijie [2 ]
Zhao, Yifang [2 ]
Yang, Kaichen [3 ]
Zhang, Xuan [4 ]
Jin, Yier [2 ]
Guo, Xiaolong [1 ]
机构
[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China
[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;
D O I
10.1109/TCSI.2024.3487486
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.
引用
收藏
页码:623 / 636
页数:14
相关论文
共 50 条
  • [21] ChatChisel: Enabling Agile Hardware Design with Large Language Models
    Liu, Tianyang
    Tian, Qi
    Ye, Jianmin
    Fu, LikTung
    Su, Shengchu
    Li, Junyan
    Wane, Gwok-Waa
    Zhang, Layton
    Wong, Sam-Zaak
    Wang, Xi
    Yang, Jun
    2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 710 - 716
  • [22] Leveraging Large Language Models for the Automated Documentation of Hardware Designs
    Fernando, Saruni
    Kunzelmann, Robert
    Lopera, Daniela Sanchez
    Al Halabi, Jad
    Ecker, Wolfgang
    2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, 2024, : 165 - 170
  • [23] Code Detection for Hardware Acceleration Using Large Language Models
    Martinez, Pablo Antonio
    Bernabe, Gregorio
    Garcia, Jose Manuel
    IEEE ACCESS, 2024, 12 : 35271 - 35281
  • [24] Exploring Large Language Models for Verilog hardware design generation
    D'Hollander, Erik H.
    Danneels, Ewout
    Decorte, Karel-Brecht
    Loobuyck, Senne
    Vanheule, Ame
    Van Kets, Ian
    Stroobandt, Dirk
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 111 - 115
  • [25] Evolutionary Large Language Models for Hardware Security: A Comparative Survey
    Akyash, Mohammad
    Kamali, Hadi M.
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 496 - 501
  • [26] Multiclass U-Net Segmentation of Brain Electron Microscopy Data Using Original and Semi-Synthetic Training Datasets
    A. A. Getmanskaya
    N. A. Sokolov
    V. E. Turlapov
    Programming and Computer Software, 2022, 48 : 164 - 171
  • [27] Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
    Barr, Austin A.
    Quan, Joshua
    Guo, Eddie
    Sezgin, Emre
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 8
  • [28] LLM4SecHW: Leveraging Domain-Specific Large Language Model for Hardware Debugging
    Fu, Weimin
    Yang, Kaichen
    Dutta, Raj Gautam
    Guo, Xiaolong
    Qu, Gang
    2023 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM, ASIANHOST, 2023,
  • [29] ANTIGENICITY OF SEMI-SYNTHETIC PENICILLIN PREPARATIONS TO EVOKE SYSTEMIC ANAPHYLACTIC REACTIONS IN ANIMAL-MODELS
    KOIZUMI, K
    SUZUKI, S
    FUKUBA, S
    TADOKORO, K
    HIRAI, K
    MURANAKA, M
    ALLERGY, 1980, 35 (08) : 657 - 664
  • [30] Aligning Large Language Models through Synthetic Feedback
    Kim, Sungdong
    Bae, Sanghwan
    Shin, Jamin
    Kang, Soyoung
    Kwak, Donghyun
    Yoo, Kang Min
    Seo, Minjoon
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13677 - 13700