A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引:0
|
作者
Fu, Weimin [1 ]
Li, Shijie [2 ]
Zhao, Yifang [2 ]
Yang, Kaichen [3 ]
Zhang, Xuan [4 ]
Jin, Yier [2 ]
Guo, Xiaolong [1 ]
机构
[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China
[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;
D O I
10.1109/TCSI.2024.3487486
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.
引用
收藏
页码:623 / 636
页数:14
相关论文
共 50 条
  • [31] Can large language models help augment English psycholinguistic datasets?
    Trott, Sean
    BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 6082 - 6100
  • [32] Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
    Reif, Emily
    Kahng, Minsuk
    Petridis, Savvas
    2023 IEEE VISUALIZATION AND VISUAL ANALYTICS, VIS, 2023, : 236 - 240
  • [33] Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
    Zhao, Wei
    Li, Zhe
    Li, Yige
    Sun, Jun
    arXiv,
  • [34] MAGNIFICO: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
    Patel, Arkil
    Bhattamishra, Satwik
    Reddy, Siva
    Bahdanau, Dzmitry
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2167 - 2189
  • [35] A facile semi-synthetic approach towards halogen-substituted aminobenzoic acid analogues of platensimycin
    Qiu, Lin
    Tian, Kai
    Pan, Jian
    Jiang, Lin
    Yang, Hu
    Zhu, Xiangcheng
    Shen, Ben
    Duan, Yanwen
    Huang, Yong
    TETRAHEDRON, 2017, 73 (06) : 771 - 775
  • [36] A SEMI-SYNTHETIC APPROACH TO OLEFINIC ANALOGS OF AMINO-ACID ONE (MEBMT) IN CYCLOSPORINE-A
    PARK, SB
    MEIER, GP
    TETRAHEDRON LETTERS, 1989, 30 (32) : 4215 - 4218
  • [37] Resilience Assessment of Large Language Models under Transient Hardware Faults
    Agarwal, Udit Kumar
    Chan, Abraham
    Pattabiraman, Karthik
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 659 - 670
  • [38] A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
    Guo, Cong
    Cheng, Feng
    Du, Zhixu
    Kiessling, James
    Ku, Jonathan
    Li, Shiyu
    Li, Ziru
    Ma, Mingyuan
    Molom-Ochir, Tergel
    Morris, Benjamin
    Shan, Haoxuan
    Sun, Jingwei
    Wang, Yitu
    Wei, Chiyue
    Wu, Xueying
    Wu, Yuhao
    Yang, Hao Frank
    Zhang, Jingyang
    Zhang, Junyao
    Zheng, Qilin
    Zhou, Guanglei
    Li, Hai
    Chen, Yiran
    IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2025, 25 (01) : 35 - 57
  • [39] WiP: Towards Light Adaptation of Large Language Models For Personal Hardware
    Wang, Liangyu
    Wang, Junxiao
    Wang, Di
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 30 - 32
  • [40] On Hardware Security Bug Code Fixes by Prompting Large Language Models
    Ahmad, Baleegh
    Thakur, Shailja
    Tan, Benjamin
    Karri, Ramesh
    Pearce, Hammond
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057