Automatic Mixed Precision and Distributed Data-Parallel Training for Satellite Image Classification using CNN

被引:0
|
作者
Nuwara, Yohanes [1 ]
Kitt, Wong W. [2 ]
Juwono, Filbert H. [3 ]
Ollivierre, Gregory [4 ]
机构
[1] Asia Pulp & Paper, Sinarmas Land Plaza MH Thamrin, Jakarta, Indonesia
[2] Curtin Univ Malaysia, CDT 250, Bellevue, WA 98009 USA
[3] Univ Southampton Malaysia, Iskandar Puteri 79100, Johor, Malaysia
[4] OmegaCrop, 71-75 Shelton St, London, England
关键词
Automatic Mixed Precision; Convolutional Neural Network; Distributed Data-Parallel; Graphics Processing Unit; Remote Sensing;
D O I
10.1117/12.2679828
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning models for computer vision in remote sensing such as Convolutional Neural Network (CNN) has benefitted acceleration from the usage of multiple CPUs and GPUs. There are several ways to make the training stage more effective in terms of utilizing multiple cores at the same time by processing different image mini-batches with a duplicated model called Distributed Data Parallelization (DDP) and computing the parameters in a lower precision floating-point number called Automatic Mixed Precision (AMP). We would like to investigate the impact of DDP and AMP training modes on the overall utilization and memory consumption of CPU and GPU, as well as the accuracy of a CNN model. The study is performed on the EuroSAT dataset, a Sentinel-2-based benchmark satellite image dataset for image classification of land covers. We compare training using 1 CPU, using DDP, and using both DDP and AMP over 100 epochs using ResNet-18 architecture. The hardware that we used are Intel Xeon Silver 4116 CPU with 24 cores and an NVIDIA v100 GPU. We find that although parallelization of CPUs or DDP takes less time to train on the images, it can take 50 MB more memory than using only a single CPU. The combination of DDP and AMP can release memory up to 160 MB and reduce computation time by 20 seconds. The test accuracy is slightly higher for both DDP and DDP-AMP at 90.61% and 90.77% respectively than without DDP and AMP at 89.84%. Hence, training using Distributed Data Parallelization (DDP) and Automatic Mixed Precision (AMP) has more benefits in terms of lower GPU memory consumption, faster training execution time, faster convergence towards solutions, and finally, higher accuracy.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Mixed Data-Parallel Scheduling for Distributed Continuous Integration
    Beaumont, Olivier
    Bonichon, Nicolas
    Courtes, Ludovic
    Hanin, Xavier
    Dolstra, Eelco
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 91 - 98
  • [2] AdaComp: Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
    Chen, Chia-Yu
    Choi, Jungwook
    Brand, Daniel
    Agrawal, Ankur
    Zhang, Wei
    Gopalakrishnan, Kailash
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2827 - 2835
  • [3] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
    Bo Zhao
    Hucheng Zhou
    Guoqiang Li
    Yihua Huang
    Big Data Mining and Analytics, 2018, (01) : 57 - 74
  • [4] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
    Zhao, Bo
    Zhou, Hucheng
    Li, Guoqiang
    Huang, Yihua
    BIG DATA MINING AND ANALYTICS, 2018, 1 (01): : 57 - 74
  • [5] Hybrid Image-/Data-Parallel Rendering Using Island Parallelism
    Zellmann, Stefan
    Wald, Ingo
    Barbosa, Joao
    Dermici, Serkan
    Sahistan, Alper
    Gudukbay, Ugur
    2022 IEEE 12TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV 2022), 2022, : 36 - 45
  • [6] Distributed Data-Parallel Computing Using a High-Level Programming Language
    Isard, Michael
    Yu, Yuan
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 987 - 994
  • [7] A General-purpose Distributed Programming System using Data-parallel Streams
    Huang, Tsung-Wei
    Lin, Chun-Xun
    Guo, Guannan
    Wong, Martin D. F.
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1360 - 1363
  • [8] Compressed Collective Sparse-Sketch for Distributed Data-Parallel Training of Deep Learning Models
    Ge, Keshi
    Lu, Kai
    Fu, Yongquan
    Deng, Xiaoge
    Lai, Zhiquan
    Li, Dongsheng
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 941 - 963
  • [9] Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN
    El Miqdadi, Ikram
    Hourri, Soufiane
    El Idrysy, Fatima Zahra
    Hayati, Assia
    Namir, Yassine
    Nikolov, Nikola S.
    Kharroubi, Jamal
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (06) : 3805 - 3830
  • [10] AUTOMATIC GENERATION OF TRAINING DATA FOR HYPERSPECTRAL IMAGE CLASSIFICATION USING SUPPORT VECTOR MACHINE
    Abbasi, B.
    Arefi, H.
    Bigdeli, B.
    Roessner, S.
    36TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT, 2015, 47 (W3): : 575 - 580