Binary Classification Optimisation with AI-Generated Data

被引:0
|
作者
Mazon, Manuel Jesus Cerezo [1 ]
Garcia, Ricardo Moya [1 ]
Garcia, Ekaitz Arriola [1 ]
del Castillo, Miguel Herencia Garcia [1 ]
Iglesias, Guillermo [2 ]
机构
[1] Ainovis, Colquide 6, Madrid 28231, Spain
[2] Univ Politecn Madrid, Madrid, Spain
来源
关键词
Machine Learning; Synthetic Data; GAN; Skin Lesion Classification; Data Augmentation; FID; ISIC; Medical Imaging; AUGMENTATION;
D O I
10.1007/978-3-031-80889-0_15
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the field of machine learning, obtaining sufficient and high-quality data is a persistent challenge. This report explores the innovative solution of using synthetic data generated from existing datasets to overcome this limitation. By employing synthetic data, we not only increase the quantity of available information but also maintain the integrity and essential characteristics of natural data. This methodology allows the application of conventional data augmentation techniques, ensuring a more robust and efficient learning process. The study is based on a dataset provided by the International Skin Imaging Collaboration (ISIC), consisting of 3,323 cases divided equally between melanomas and Basal Cell Carcinoma (BCC). Using Generative Adversarial Networks (GANs), specifically StyleGAN2 with transfer learning from the Flickr-Faces-HQ (FFHQ) model, synthetic images were generated, expanding the dataset fourfold to a total of 26,584 synthetic records. The quality of the synthetic images was ensured using the Frechet Inception Distance (FID) metric [5], with BCC obtaining 22.2534 and melanomas obtaining 20.4577 according to this metric. Models trained with a hybrid approach using both real and synthetic data showed improved performance metrics (F1 0.71 to 0.79), highlighting the effectiveness of this method in enhancing binary classification tasks in medical imaging. The source code for all the research, along with the generated dataset is publicly available.
引用
收藏
页码:210 / 216
页数:7
相关论文
共 50 条
  • [1] AI models fed AI-generated data quickly spew nonsense
    Gibney, Elizabeth
    NATURE, 2024, 632 (8023) : 18 - 19
  • [2] AI-generated or AI touch-up? Identifying AI contribution in text data
    Hashemi, Ahmad
    Shi, Wei
    Corriveau, Jean-Pierre
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [3] AI-Generated Clinical Summaries
    Chen, Charlaine
    Thornton, Joseph E.
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (22): : 1967 - 1968
  • [4] Not a generative AI-generated Editorial
    不详
    NATURE CANCER, 2023, 4 (02) : 151 - 152
  • [5] The Age of Generative AI and AI-Generated Everything
    Du, Hongyang
    Niyato, Dusit
    Kang, Jiawen
    Xiong, Zehui
    Zhang, Ping
    Cui, Shuguang
    Shen, Xuemin
    Mao, Shiwen
    Han, Zhu
    Jamalipour, Abbas
    Poor, H. Vincent
    Kim, Dong In
    IEEE NETWORK, 2024, 38 (06): : 501 - 512
  • [6] AI produces gibberish when trained on too much AI-generated data
    Wenger, Emily
    NATURE, 2024, 631 (8022) : 742 - 743
  • [7] Experts fail to reliably detect AI-generated histological data
    Hartung, Jan
    Reuter, Stefanie
    Kulow, Vera Anna
    Faehling, Michael
    Spreckelsen, Cord
    Mrowka, Ralf
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] Towards AI-Generated Essay Classification Using Numerical Text Representation
    Krawczyk, Natalia
    Probierz, Barbara
    Kozak, Jan
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [9] Classification of human- and AI-generated texts for different languages and domains
    Kristina Schaaff
    Tim Schlippe
    Lorenz Mindner
    International Journal of Speech Technology, 2024, 27 (4) : 935 - 956
  • [10] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images
    Bird, Jordan J.
    Lotfi, Ahmad
    IEEE ACCESS, 2024, 12 : 15642 - 15650