A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引：0

作者：

Fu, Weimin ^{[1
]}

Li, Shijie ^{[2
]}

Zhao, Yifang ^{[2
]}

Yang, Kaichen ^{[3
]}

Zhang, Xuan ^{[4
]}

Jin, Yier ^{[2
]}

Guo, Xiaolong ^{[1
]}

机构：

[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA

[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China

[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA

[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2025年 / 72卷 / 02期

基金：

美国国家科学基金会;

关键词：

Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;

D O I：

10.1109/TCSI.2024.3487486

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.

引用

页码：623 / 636

页数：14

共 50 条

[21] ChatChisel: Enabling Agile Hardware Design with Large Language Models
Liu, Tianyang
Tian, Qi
Ye, Jianmin
Fu, LikTung
Su, Shengchu
Li, Junyan
Wane, Gwok-Waa
Zhang, Layton
Wong, Sam-Zaak
Wang, Xi
Yang, Jun
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 710 - 716
[22] Leveraging Large Language Models for the Automated Documentation of Hardware Designs
Fernando, Saruni
Kunzelmann, Robert
Lopera, Daniela Sanchez
Al Halabi, Jad
Ecker, Wolfgang
2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, 2024, : 165 - 170
[23] Code Detection for Hardware Acceleration Using Large Language Models
Martinez, Pablo Antonio
Bernabe, Gregorio
Garcia, Jose Manuel
IEEE ACCESS, 2024, 12 : 35271 - 35281
[24] Exploring Large Language Models for Verilog hardware design generation
D'Hollander, Erik H.
Danneels, Ewout
Decorte, Karel-Brecht
Loobuyck, Senne
Vanheule, Ame
Van Kets, Ian
Stroobandt, Dirk
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 111 - 115
[25] Evolutionary Large Language Models for Hardware Security: A Comparative Survey
Akyash, Mohammad
Kamali, Hadi M.
PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 496 - 501
[26] Multiclass U-Net Segmentation of Brain Electron Microscopy Data Using Original and Semi-Synthetic Training Datasets
A. A. Getmanskaya
N. A. Sokolov
V. E. Turlapov
Programming and Computer Software, 2022, 48 : 164 - 171
[27] Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data
Barr, Austin A.
Quan, Joshua
Guo, Eddie
Sezgin, Emre
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2025, 8
[28] LLM4SecHW: Leveraging Domain-Specific Large Language Model for Hardware Debugging
Fu, Weimin
Yang, Kaichen
Dutta, Raj Gautam
Guo, Xiaolong
Qu, Gang
2023 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM, ASIANHOST, 2023,
[29] ANTIGENICITY OF SEMI-SYNTHETIC PENICILLIN PREPARATIONS TO EVOKE SYSTEMIC ANAPHYLACTIC REACTIONS IN ANIMAL-MODELS
KOIZUMI, K
SUZUKI, S
FUKUBA, S
TADOKORO, K
HIRAI, K
MURANAKA, M
ALLERGY, 1980, 35 (08) : 657 - 664
[30] Aligning Large Language Models through Synthetic Feedback
Kim, Sungdong
Bae, Sanghwan
Shin, Jamin
Kang, Soyoung
Kwak, Donghyun
Yoo, Kang Min
Seo, Minjoon
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13677 - 13700

← 1 2 3 4 5 →