GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model

被引:0
|
作者
Yu, Caiyang [1 ]
Liu, Xianggen [1 ]
Wang, Yifan [1 ]
Liu, Yun [1 ]
Feng, Wentao [1 ]
Deng, Xiong [2 ]
Tang, Chenwei [1 ]
Lv, Jiancheng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci & Engn, Res Ctr Machine Learning & Ind Intelligence, Minist Educ, Chengdu 610065, Peoples R China
[2] Stevens Inst Technol, Dept Mech Engn, Hoboken, NJ 07030 USA
来源
BIG DATA MINING AND ANALYTICS | 2025年 / 8卷 / 01期
关键词
Search problems; Computer architecture; Encoding; Training; Optimization; Data models; Neural networks; Neural Architecture Search (NAS); Generative Pre-trained Transformer (GPT) model; evolutionary algorithm; image classification;
D O I
10.26599/BDMA.2024.9020036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.
引用
收藏
页码:45 / 64
页数:20
相关论文
共 50 条
  • [21] GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as a Plug-and-Play Transductive Model for Medical Image Analysis
    Zhang, Yizhe
    Chen, Danny Z.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 151 - 160
  • [22] The application of Chat Generative Pre-trained Transformer in nursing education
    Liu, Jialin
    Liu, Fan
    Fang, Jinbo
    Liu, Siru
    NURSING OUTLOOK, 2023, 71 (06)
  • [24] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
    S. B. Atallah
    N. R. Banda
    A. Banda
    N. A. Roeck
    Techniques in Coloproctology, 2023, 27 : 609 - 614
  • [25] GPT (Generative Pre-Trained Transformer)-A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions
    Yenduri, Gokul
    Ramalingam, M.
    Selvi, G. Chemmalar
    Supriya, Y.
    Srivastava, Gautam
    Maddikunta, Praveen Kumar Reddy
    Raj, G. Deepti
    Jhaveri, Rutvij H.
    Prabadevi, B.
    Wang, Weizheng
    Vasilakos, Athanasios V.
    Gadekallu, Thippa Reddy
    IEEE ACCESS, 2024, 12 : 54608 - 54649
  • [26] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
    Atallah, S. B.
    Banda, N. R.
    Banda, A.
    Roeck, N. A.
    TECHNIQUES IN COLOPROCTOLOGY, 2023, 27 (08) : 609 - 614
  • [27] GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
    Huang, Jia-Hong
    Murn, Luka
    Mrak, Marta
    Worring, Marcel
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 580 - 589
  • [28] The impact of Chat Generative Pre-trained Transformer (ChatGPT) on medical education
    Heng, Jonathan J. Y.
    Teo, Desmond B.
    Tan, L. F.
    POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) : 1125 - 1127
  • [29] Enhancing rumor detection with data augmentation and generative pre-trained transformer
    Askarizade, Mojgan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
  • [30] Leveraging Generative Pre-Trained Transformer Models for Standardizing Nursing Data
    Baranwal, Aseem
    Semenov, Alexander
    Salgado, Patricia de Oliveira
    Priola, Karen B.
    Yao, Yingwei
    Keenan, Gail M.
    Macieira, Tamara G. R.
    2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, : 386 - 391