GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model

被引:0
|
作者
Yu, Caiyang [1 ]
Liu, Xianggen [1 ]
Wang, Yifan [1 ]
Liu, Yun [1 ]
Feng, Wentao [1 ]
Deng, Xiong [2 ]
Tang, Chenwei [1 ]
Lv, Jiancheng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci & Engn, Res Ctr Machine Learning & Ind Intelligence, Minist Educ, Chengdu 610065, Peoples R China
[2] Stevens Inst Technol, Dept Mech Engn, Hoboken, NJ 07030 USA
来源
BIG DATA MINING AND ANALYTICS | 2025年 / 8卷 / 01期
关键词
Search problems; Computer architecture; Encoding; Training; Optimization; Data models; Neural networks; Neural Architecture Search (NAS); Generative Pre-trained Transformer (GPT) model; evolutionary algorithm; image classification;
D O I
10.26599/BDMA.2024.9020036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.
引用
收藏
页码:45 / 64
页数:20
相关论文
共 50 条
  • [1] Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology
    Sorin, Vera
    Klang, Eyal
    Sobeh, Tamer
    Konen, Eli
    Shrot, Shai
    Livne, Adva
    Weissbuch, Yulian
    Hoffmann, Chen
    Barash, Yiftach
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (10)
  • [2] Generative Pre-trained Transformer 4 (GPT-4) in clinical settings
    Bellini, Valentina
    Bignami, Elena Giovanna
    LANCET DIGITAL HEALTH, 2025, 7 (01): : e6 - e7
  • [3] Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation
    Sufi, Fahim
    INFORMATION, 2024, 15 (02)
  • [4] OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
    Chen, Le
    Bhattacharjee, Arijit
    Ahmed, Nesreen
    Hasabnis, Niranjan
    Oren, Gal
    Vo, Vy
    Jannesari, Ali
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 121 - 134
  • [5] Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design
    Haroon, Suhail
    Hafsath, C. A.
    Jereesh, A. S.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 106
  • [6] Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    Proceedings - International Workshop on Automated Program Repair, APR 2022, 2022, : 61 - 68
  • [7] Generative pre-trained transformers (GPT) for surface engineering
    Kamnis, Spyros
    SURFACE & COATINGS TECHNOLOGY, 2023, 466
  • [8] MetaQA: Enhancing human-centered data search using Generative Pre-trained Transformer (GPT) language model and artificial intelligence
    Li, Diya
    Zhang, Zhe
    PLOS ONE, 2023, 18 (11):
  • [9] GPT-LS: Generative Pre-Trained Transformer with Offline Reinforcement Learning for Logic Synthesis
    Lv, Chenyang
    Wei, Ziling
    Qian, Weikang
    Ye, Junjie
    Feng, Chang
    He, Zhezhi
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 320 - 326
  • [10] Towards Java']JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 61 - 68