GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model

被引:0
|
作者
Yu, Caiyang [1 ]
Liu, Xianggen [1 ]
Wang, Yifan [1 ]
Liu, Yun [1 ]
Feng, Wentao [1 ]
Deng, Xiong [2 ]
Tang, Chenwei [1 ]
Lv, Jiancheng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci & Engn, Res Ctr Machine Learning & Ind Intelligence, Minist Educ, Chengdu 610065, Peoples R China
[2] Stevens Inst Technol, Dept Mech Engn, Hoboken, NJ 07030 USA
来源
BIG DATA MINING AND ANALYTICS | 2025年 / 8卷 / 01期
关键词
Search problems; Computer architecture; Encoding; Training; Optimization; Data models; Neural networks; Neural Architecture Search (NAS); Generative Pre-trained Transformer (GPT) model; evolutionary algorithm; image classification;
D O I
10.26599/BDMA.2024.9020036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.
引用
收藏
页码:45 / 64
页数:20
相关论文
共 50 条
  • [31] BioGPT: generative pre-trained transformer for biomedical text generation and mining
    Luo, Renqian
    Sun, Liai
    Xia, Yingce
    Qin, Tao
    Zhang, Sheng
    Poon, Hoifung
    Liu, Tie-Yan
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (06)
  • [32] Generative Pre-trained Transformer for Pediatric Stroke Research: A Pilot Study
    Fiedler, Anna K.
    Zhang, Kai
    Lal, Tia S.
    Jiang, Xiaoqian
    Fraser, Stuart M.
    PEDIATRIC NEUROLOGY, 2024, 160
  • [33] Industrial-generative pre-trained transformer for intelligent manufacturing systems
    Wang, Han
    Liu, Min
    Shen, Weiming
    IET COLLABORATIVE INTELLIGENT MANUFACTURING, 2023, 5 (02)
  • [34] Efficient Unsupervised Community Search with Pre-trained Graph Transformer
    Wang, Jianwei
    Wang, Kai
    Lin, Xuemin
    Zhang, Wenjie
    Zhang, Ying
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (09): : 2227 - 2240
  • [35] SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers
    Alahmadi, Mohammad D.
    Alshangiti, Moayad
    Alsubhi, Jumana
    MATHEMATICS, 2024, 12 (13)
  • [36] CSI-GPT: Integrating Generative Pre-Trained Transformer With Federated-Tuning to Acquire Downlink Massive MIMO Channels
    Zeng, Ye
    Qiao, Li
    Gao, Zhen
    Qin, Tong
    Wu, Zhonghuai
    Khalaf, Emad
    Chen, Sheng
    Guizani, Mohsen
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (03) : 5187 - 5192
  • [37] Performance of a commercially available Generative Pre-trained Transformer (GPT) in describing radiolucent lesions in panoramic radiographs and establishing differential diagnoses
    Silva, Thaisa Pinheiro
    Andrade-Bortoletto, Maria Fernanda Silva
    Ocampo, Thais Santos Cerqueira
    Alencar-Palha, Caio
    Bornstein, Michael M.
    Oliveira-Santos, Christiano
    Oliveira, Matheus L.
    CLINICAL ORAL INVESTIGATIONS, 2024, 28 (03)
  • [38] Performance of a commercially available Generative Pre-trained Transformer (GPT) in describing radiolucent lesions in panoramic radiographs and establishing differential diagnoses
    Thaísa Pinheiro Silva
    Maria Fernanda Silva Andrade-Bortoletto
    Thaís Santos Cerqueira Ocampo
    Caio Alencar-Palha
    Michael M. Bornstein
    Christiano Oliveira-Santos
    Matheus L. Oliveira
    Clinical Oral Investigations, 28
  • [39] Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4)
    Truhn, Daniel
    Loeffler, Chiara M. L.
    Mueller-Franzes, Gustav
    Nebelung, Sven
    Hewitt, Katherine J.
    Brandner, Sebastian
    Bressem, Keno K.
    Foersch, Sebastian
    Kather, Jakob Nikolas
    JOURNAL OF PATHOLOGY, 2024, 262 (03): : 310 - 319
  • [40] Enhancing clinical reasoning with Chat Generative Pre-trained Transformer: a practical guide
    Hirosawa, Takanobu
    Shimizu, Taro
    DIAGNOSIS, 2024, 11 (01) : 102 - 105