A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets

被引:0
|
作者
Fu, Weimin [1 ]
Li, Shijie [2 ]
Zhao, Yifang [2 ]
Yang, Kaichen [3 ]
Zhang, Xuan [4 ]
Jin, Yier [2 ]
Guo, Xiaolong [1 ]
机构
[1] Kansas State Univ, Mike Wiegers Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Anhui, Peoples R China
[3] Michigan Technol Univ, Dept Elect & Comp Engn, Houghton, MI 49931 USA
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Hardware; Codes; Training; Software; Large language models; Chatbots; Debugging; Synthetic data; Open source hardware; Computer bugs; Large language model; artificial intelligence; hardware debug; version control; electronic design automation; ENERGY;
D O I
10.1109/TCSI.2024.3487486
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Large Language Models (LLMs) have precipitated emerging trends towards intelligent automation. However, integrating LLMs into the hardware debug domain encounters challenges: the datasets for LLMs for hardware are often plagued by a dual dilemma - scarcity and subpar quality. Traditional hardware debug approaches that rely on experienced labor to generate detailed prompts are not cheaply scalable. Similarly, strategies that depend on existing LLMs and randomly generated prompts fail to achieve sufficient reliability. We propose a directed, semi-synthetic data synthetic method that leverages version control information and journalistic event descriptions. To produce high-quality data, this approach utilizes version control data from hardware projects combined with the 5W1H (Who, What, When, Where, Why, How) journalistic principles. It facilitates the linear scaling of dataset volumes without depending on skilled labor. We have implemented this method on a collected dataset of open-source hardware designs and fine-tuned fifteen general-purpose LLMs to enable their capability in hardware debugging tasks, thereby validating the efficacy of our approach.
引用
收藏
页码:623 / 636
页数:14
相关论文
共 50 条
  • [41] Topic Segmentation of Semi-structured and Unstructured Conversational Datasets Using Language Models
    Ghosh, Reshmi
    Kajal, Harjeet Singh
    Kamath, Sharanya
    Shrivastava, Dhuri
    Basu, Samyadeep
    Zeng, Hansi
    Srinivasan, Soundararajan
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 91 - 104
  • [42] OntoGenix: Leveraging Large Language Models for enhanced ontology engineering from datasets
    Val-Calvo, Mikel
    Aranguren, Mikel Egana
    Mulero-Hernandez, Juan
    Almagro-Hernandez, Gines
    Deshmukh, Prashant
    Bernabe-Diaz, Jose Antonio
    Espinoza-Arias, Paola
    Sanchez-Fernandez, Jose Luis
    Mueller, Juergen
    Fernandez-Breis, Jesualdo Tomas
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [43] A Modular Approach to a Library of Semi-Synthetic Fucosylated Chondroitin Sulfate Polysaccharides with Different Sulfation and Fucosylation Patterns
    Laezza, Antonio
    Iadonisi, Alfonso
    Pirozzi, Anna V. A.
    Diana, Paola
    De Rosa, Mario
    Schiraldi, Chiara
    Parrilli, Michelangelo
    Bedini, Enoiliano
    CHEMISTRY-A EUROPEAN JOURNAL, 2016, 22 (50) : 18215 - 18226
  • [44] Large language models and synthetic health data: progress and prospects
    Smolyak, Daniel
    Bjarnadottir, Margret, V
    Crowley, Kenyon
    Agarwal, Ritu
    JAMIA OPEN, 2024, 7 (04)
  • [45] A Semi-Synthetic Approach to Engineer Ligand- and Voltage-Gated Ion Channels in Live Cells
    Khoo, Keith K.
    Galleano, Iacopo
    Pless, Stephan A.
    BIOPHYSICAL JOURNAL, 2019, 116 (03) : 274A - 274A
  • [46] NOVEL METHOD FOR PROGNOSIS OF ANTIBACTERIAL PROPERTIES OF SEMI-SYNTHETIC PENICILLINS .1. LOGICAL STRUCTURAL APPROACH
    VEINBERG, GA
    KATS, AM
    GITLINA, LS
    GOLENDER, VE
    ROZENBLIT, AB
    LUKEVITS, E
    KHIMIYA GETEROTSIKLICHESKIKH SOEDINENII, 1989, (03): : 396 - 403
  • [47] A large semi-synthetic single-chain Fv phage display library based on chicken immunoglobulin genes
    Wouter van Wyngaardt
    Teresiah Malatji
    Cordelia Mashau
    Jeanni Fehrsen
    Frances Jordaan
    Dubravka Miltiadou
    Dion H du Plessis
    BMC Biotechnology, 4
  • [48] Re-imagen: Generating coherent background activity in synthetic scenario-based forensic datasets using large language models
    Voigt, Lena L.
    Freiling, Felix
    Hargreaves, Christopher J.
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 50
  • [49] Automatic building energy model development and debugging using large language models agentic workflow
    Zhang, Liang
    Ford, Vitaly
    Chen, Zhelun
    Chen, Jianli
    ENERGY AND BUILDINGS, 2025, 327
  • [50] Constructing synthetic datasets with generative artificial intelligence to train large language models to classify acute renal failure from clinical notes
    Litake, Onkar
    Park, Brian H.
    Tully, Jeffrey L.
    Gabriel, Rodney A.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (06) : 1404 - 1410