A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引：0

作者：

Fu, Weimin ^{[1
]}

Li, Shijie ^{[2
]}

Zhao, Yifang ^{[2
]}

Yang, Kaichen ^{[3
]}

Zhang, Xuan ^{[4
]}

Jin, Yier ^{[2
]}

Guo, Xiaolong ^{[1
]}

机构：

[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA

[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China

[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA

[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2025年 / 72卷 / 02期

基金：

美国国家科学基金会;

关键词：

Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;

D O I：

10.1109/TCSI.2024.3487486

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.

引用

页码：623 / 636

页数：14

共 50 条

[41] Topic Segmentation of Semi-structured and Unstructured Conversational Datasets Using Language Models
Ghosh, Reshmi
Kajal, Harjeet Singh
Kamath, Sharanya
Shrivastava, Dhuri
Basu, Samyadeep
Zeng, Hansi
Srinivasan, Soundararajan
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 91 - 104
[42] OntoGenix: Leveraging Large Language Models for enhanced ontology engineering from datasets
Val-Calvo, Mikel
Aranguren, Mikel Egana
Mulero-Hernandez, Juan
Almagro-Hernandez, Gines
Deshmukh, Prashant
Bernabe-Diaz, Jose Antonio
Espinoza-Arias, Paola
Sanchez-Fernandez, Jose Luis
Mueller, Juergen
Fernandez-Breis, Jesualdo Tomas
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
[43] A Modular Approach to a Library of Semi-Synthetic Fucosylated Chondroitin Sulfate Polysaccharides with Different Sulfation and Fucosylation Patterns
Laezza, Antonio
Iadonisi, Alfonso
Pirozzi, Anna V. A.
Diana, Paola
De Rosa, Mario
Schiraldi, Chiara
Parrilli, Michelangelo
Bedini, Enoiliano
CHEMISTRY-A EUROPEAN JOURNAL, 2016, 22 (50) : 18215 - 18226
[44] Large language models and synthetic health data: progress and prospects
Smolyak, Daniel
Bjarnadottir, Margret, V
Crowley, Kenyon
Agarwal, Ritu
JAMIA OPEN, 2024, 7 (04)
[45] A Semi-Synthetic Approach to Engineer Ligand- and Voltage-Gated Ion Channels in Live Cells
Khoo, Keith K.
Galleano, Iacopo
Pless, Stephan A.
BIOPHYSICAL JOURNAL, 2019, 116 (03) : 274A - 274A
[46] NOVEL METHOD FOR PROGNOSIS OF ANTIBACTERIAL PROPERTIES OF SEMI-SYNTHETIC PENICILLINS .1. LOGICAL STRUCTURAL APPROACH
VEINBERG, GA
KATS, AM
GITLINA, LS
GOLENDER, VE
ROZENBLIT, AB
LUKEVITS, E
KHIMIYA GETEROTSIKLICHESKIKH SOEDINENII, 1989, (03): : 396 - 403
[47] A large semi-synthetic single-chain Fv phage display library based on chicken immunoglobulin genes
Wouter van Wyngaardt
Teresiah Malatji
Cordelia Mashau
Jeanni Fehrsen
Frances Jordaan
Dubravka Miltiadou
Dion H du Plessis
BMC Biotechnology, 4
[48] Re-imagen: Generating coherent background activity in synthetic scenario-based forensic datasets using large language models
Voigt, Lena L.
Freiling, Felix
Hargreaves, Christopher J.
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 50
[49] Automatic building energy model development and debugging using large language models agentic workflow
Zhang, Liang
Ford, Vitaly
Chen, Zhelun
Chen, Jianli
ENERGY AND BUILDINGS, 2025, 327
[50] Constructing synthetic datasets with generative artificial intelligence to train large language models to classify acute renal failure from clinical notes
Litake, Onkar
Park, Brian H.
Tully, Jeffrey L.
Gabriel, Rodney A.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (06) : 1404 - 1410

← 1 2 3 4 5 →