A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引：0

作者：

Fu, Weimin ^{[1
]}

Li, Shijie ^{[2
]}

Zhao, Yifang ^{[2
]}

Yang, Kaichen ^{[3
]}

Zhang, Xuan ^{[4
]}

Jin, Yier ^{[2
]}

Guo, Xiaolong ^{[1
]}

机构：

[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA

[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China

[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA

[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2025年 / 72卷 / 02期

基金：

美国国家科学基金会;

关键词：

Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;

D O I：

10.1109/TCSI.2024.3487486

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.

引用

页码：623 / 636

页数：14

共 50 条

[31] Can large language models help augment English psycholinguistic datasets?
Trott, Sean
BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 6082 - 6100
[32] Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
Reif, Emily
Kahng, Minsuk
Petridis, Savvas
2023 IEEE VISUALIZATION AND VISUAL ANALYTICS, VIS, 2023, : 236 - 240
[33] Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
Zhao, Wei
Li, Zhe
Li, Yige
Sun, Jun
arXiv,
[34] MAGNIFICO: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Patel, Arkil
Bhattamishra, Satwik
Reddy, Siva
Bahdanau, Dzmitry
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2167 - 2189
[35] A facile semi-synthetic approach towards halogen-substituted aminobenzoic acid analogues of platensimycin
Qiu, Lin
Tian, Kai
Pan, Jian
Jiang, Lin
Yang, Hu
Zhu, Xiangcheng
Shen, Ben
Duan, Yanwen
Huang, Yong
TETRAHEDRON, 2017, 73 (06) : 771 - 775
[36] A SEMI-SYNTHETIC APPROACH TO OLEFINIC ANALOGS OF AMINO-ACID ONE (MEBMT) IN CYCLOSPORINE-A
PARK, SB
MEIER, GP
TETRAHEDRON LETTERS, 1989, 30 (32) : 4215 - 4218
[37] Resilience Assessment of Large Language Models under Transient Hardware Faults
Agarwal, Udit Kumar
Chan, Abraham
Pattabiraman, Karthik
2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 659 - 670
[38] A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Guo, Cong
Cheng, Feng
Du, Zhixu
Kiessling, James
Ku, Jonathan
Li, Shiyu
Li, Ziru
Ma, Mingyuan
Molom-Ochir, Tergel
Morris, Benjamin
Shan, Haoxuan
Sun, Jingwei
Wang, Yitu
Wei, Chiyue
Wu, Xueying
Wu, Yuhao
Yang, Hao Frank
Zhang, Jingyang
Zhang, Junyao
Zheng, Qilin
Zhou, Guanglei
Li, Hai
Chen, Yiran
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2025, 25 (01) : 35 - 57
[39] WiP: Towards Light Adaptation of Large Language Models For Personal Hardware
Wang, Liangyu
Wang, Junxiao
Wang, Di
PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 30 - 32
[40] On Hardware Security Bug Code Fixes by Prompting Large Language Models
Ahmad, Baleegh
Thakur, Shailja
Tan, Benjamin
Karri, Ramesh
Pearce, Hammond
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057

← 1 2 3 4 5 →