Binary Classification Optimisation with AI-Generated Data

被引:0
|
作者
Mazon, Manuel Jesus Cerezo [1 ]
Garcia, Ricardo Moya [1 ]
Garcia, Ekaitz Arriola [1 ]
del Castillo, Miguel Herencia Garcia [1 ]
Iglesias, Guillermo [2 ]
机构
[1] Ainovis, Colquide 6, Madrid 28231, Spain
[2] Univ Politecn Madrid, Madrid, Spain
来源
关键词
Machine Learning; Synthetic Data; GAN; Skin Lesion Classification; Data Augmentation; FID; ISIC; Medical Imaging; AUGMENTATION;
D O I
10.1007/978-3-031-80889-0_15
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the field of machine learning, obtaining sufficient and high-quality data is a persistent challenge. This report explores the innovative solution of using synthetic data generated from existing datasets to overcome this limitation. By employing synthetic data, we not only increase the quantity of available information but also maintain the integrity and essential characteristics of natural data. This methodology allows the application of conventional data augmentation techniques, ensuring a more robust and efficient learning process. The study is based on a dataset provided by the International Skin Imaging Collaboration (ISIC), consisting of 3,323 cases divided equally between melanomas and Basal Cell Carcinoma (BCC). Using Generative Adversarial Networks (GANs), specifically StyleGAN2 with transfer learning from the Flickr-Faces-HQ (FFHQ) model, synthetic images were generated, expanding the dataset fourfold to a total of 26,584 synthetic records. The quality of the synthetic images was ensured using the Frechet Inception Distance (FID) metric [5], with BCC obtaining 22.2534 and melanomas obtaining 20.4577 according to this metric. Models trained with a hybrid approach using both real and synthetic data showed improved performance metrics (F1 0.71 to 0.79), highlighting the effectiveness of this method in enhancing binary classification tasks in medical imaging. The source code for all the research, along with the generated dataset is publicly available.
引用
收藏
页码:210 / 216
页数:7
相关论文
共 50 条
  • [31] Comparison between human-defined and AI-generated design spaces for the optimisation of shell structures
    Mirra, Gabriele
    Pugnale, Alberto
    STRUCTURES, 2021, 34 : 2950 - 2961
  • [32] Computational Power and Subjective Quality of AI-Generated Outputs: The Case of Aesthetic Judgement and Positive Emotions in AI-Generated Art
    Grassini, Simone
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024,
  • [33] AI vs. AI: Can AI Detect AI-Generated Images?
    Baraheem, Samah S.
    Nguyen, Tam V.
    JOURNAL OF IMAGING, 2023, 9 (10)
  • [34] Assessing the laboratory performance of AI-generated enzymes
    Zelezniak, Aleksej
    Yang, Kevin K.
    Johnson, Sean
    NATURE BIOTECHNOLOGY, 2024, 43 (3) : 308 - 309
  • [35] ChatGPT, AI-generated content, and engineering management
    Zuge Yu
    Yeming Gong
    Frontiers of Engineering Management, 2024, 11 : 159 - 166
  • [36] AI Usage Cards: Responsibly Reporting AI-generated Content
    Wahle, Jan Philip
    Ruas, Terry
    Mohammad, Saif M.
    Meuschke, Norman
    Gipp, Bela
    2023 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, JCDL, 2023, : 282 - 284
  • [37] Auto articles: an experiment in AI-generated content
    Catherine Armitage
    Markus Kaindl
    Nature, 2020, 588 (7837) : S138 - S141
  • [38] AI-Generated Media for Exploring Alternate Realities
    Dunnell, Kevin
    Agarwal, Gauri
    Pataranutaporn, Pat
    Lippman, Andrew
    Maes, Pattie
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [39] AI-Generated Clinical Summaries-Reply
    Goodman, Katherine E.
    Morgan, Daniel J.
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (22):
  • [40] Validating AI-Generated Code with Live Programming
    Ferdowsi, Kasra
    Huang, Ruanqianqian
    James, Michael B.
    Polikarpova, Nadia
    Lerner, Sorin
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,