From big data to smart data: a sample gradient descent approach for machine learning

被引:2
|
作者
Ganie, Aadil Gani [1 ]
Dadvandipour, Samad [1 ]
机构
[1] Univ Miskolc, H-3515 Miskolc, Hungary
关键词
Big data; Gradient decent; Machine learning; PCA; Loss function;
D O I
10.1186/s40537-023-00839-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This research paper presents an innovative approach to gradient descent known as ''Sample Gradient Descent''. This method is a modification of the conventional batch gradient descent algorithm, which is often associated with space and time complexity issues. The proposed approach involves the selection of a representative sample of data, which is subsequently subjected to batch gradient descent. The selection of this sample is a crucial task, as it must accurately represent the entire dataset. To achieve this, the study employs the use of Principle Component Analysis (PCA), which is applied to the training data, with a condition that only those rows and columns of data that explain 90% of the overall variance are retained. This approach results in a convex loss function, where a global minimum can be readily attained. Our results indicate that the proposed method offers faster convergence rates, with reduced computation times, when compared to the conventional batch gradient descent algorithm. These findings demonstrate the potential utility of the ''Sample Gradient Descent'' technique in various domains, ranging from machine learning to optimization problems. In our experiments, both approaches were run for 30 epochs, with each epoch taking approximately 3.41 s. Notably, our ''Sample Gradient Descent'' approach exhibited remarkable performance, converging in just 8 epochs, while the conventional batch gradient descent algorithm required 20 epochs to achieve convergence. This substantial difference in convergence rates, along with reduced computation times, highlights the superior efficiency of our proposed method. These findings underscore the potential utility of the ''Sample Gradient Descent'' technique across diverse domains, ranging from machine learning to optimization problems. The significant improvements in convergence rates and computation times make our algorithm particularly appealing to practitioners and researchers seeking enhanced efficiency in gradient descent optimization.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Machine learning and big scientific data
    Hey, Tony
    Butler, Keith
    Jackson, Sam
    Thiyagalingam, Jeyarajan
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [32] Machine Learning under Big Data
    Shi, Chunhe
    Wu, Chengdong
    Han, Xiaowei
    Xie, Yinghong
    Li, Zhen
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 301 - 305
  • [33] Machine learning, big data, and neuroscience
    Pillow, Jonathan
    Sahani, Maneesh
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV
  • [34] Data learning from big data
    Torrecilla, Jose L.
    Romo, Juan
    STATISTICS & PROBABILITY LETTERS, 2018, 136 : 15 - 19
  • [35] From Big Data to Smart Data: A Data Quality Perspective
    Baldassarre, Maria Teresa
    Caballero, Ismael
    Caivano, Danilo
    Garcia, Bibiano Rivas
    Piattini, Mario
    PROCEEDINGS OF THE 1ST ACM SIGSOFT INTERNATIONAL WORKSHOP ON ENSEMBLE-BASED SOFTWARE ENGINEERING (ENSEMBLE '18), 2018, : 19 - 24
  • [36] Data oriented view of a Smart City A Big Data Approach
    Joglekar, Prajakta
    Kulkarni, Vrushali
    2017 INTERNATIONAL CONFERENCE ON EMERGING TRENDS & INNOVATION IN ICT (ICEI), 2017, : 51 - 55
  • [37] Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data
    Jin, Richeng
    He, Xiaofan
    Dai, Huaiyu
    ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [38] Data Type and Data Sources for Agricultural Big Data and Machine Learning
    Cravero, Ania
    Pardo, Sebastian
    Galeas, Patricio
    Fenner, Julio Lopez
    Caniupan, Monica
    SUSTAINABILITY, 2022, 14 (23)
  • [39] In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle
    Xu, Lijie
    Qiu, Shuang
    Yuan, Binhang
    Jiang, Jiawei
    Renggli, Cedric
    Gan, Shaoduo
    Kara, Kaan
    Li, Guoliang
    Liu, Ji
    Wu, Wentao
    Ye, Jieping
    Zhang, Ce
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1286 - 1300
  • [40] Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
    Ramakrishnan, Raghunathan
    Dral, Pavlo O.
    Rupp, Matthias
    von Lilienfeld, O. Anatole
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2015, 11 (05) : 2087 - 2096