From big data to smart data: a sample gradient descent approach for machine learning

被引:2
|
作者
Ganie, Aadil Gani [1 ]
Dadvandipour, Samad [1 ]
机构
[1] Univ Miskolc, H-3515 Miskolc, Hungary
关键词
Big data; Gradient decent; Machine learning; PCA; Loss function;
D O I
10.1186/s40537-023-00839-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This research paper presents an innovative approach to gradient descent known as ''Sample Gradient Descent''. This method is a modification of the conventional batch gradient descent algorithm, which is often associated with space and time complexity issues. The proposed approach involves the selection of a representative sample of data, which is subsequently subjected to batch gradient descent. The selection of this sample is a crucial task, as it must accurately represent the entire dataset. To achieve this, the study employs the use of Principle Component Analysis (PCA), which is applied to the training data, with a condition that only those rows and columns of data that explain 90% of the overall variance are retained. This approach results in a convex loss function, where a global minimum can be readily attained. Our results indicate that the proposed method offers faster convergence rates, with reduced computation times, when compared to the conventional batch gradient descent algorithm. These findings demonstrate the potential utility of the ''Sample Gradient Descent'' technique in various domains, ranging from machine learning to optimization problems. In our experiments, both approaches were run for 30 epochs, with each epoch taking approximately 3.41 s. Notably, our ''Sample Gradient Descent'' approach exhibited remarkable performance, converging in just 8 epochs, while the conventional batch gradient descent algorithm required 20 epochs to achieve convergence. This substantial difference in convergence rates, along with reduced computation times, highlights the superior efficiency of our proposed method. These findings underscore the potential utility of the ''Sample Gradient Descent'' technique across diverse domains, ranging from machine learning to optimization problems. The significant improvements in convergence rates and computation times make our algorithm particularly appealing to practitioners and researchers seeking enhanced efficiency in gradient descent optimization.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Predicting Refractive Surgery Outcome: Machine Learning Approach With Big Data
    Achiron, Asaf
    Gur, Zvi
    Aviv, Uri
    Hilely, Assaf
    Mimouni, Michael
    Karmona, Lily
    Rokach, Lior
    Kaiserman, Igor
    JOURNAL OF REFRACTIVE SURGERY, 2017, 33 (09) : 592 - 597
  • [42] Big Data, Data Mining, Machine Learning, and Deep Learning Concepts in Crime Data
    Ates, Emre Cihan
    Bostanci, Erkan
    Guzel, Mehmet Serdar
    JOURNAL OF PENAL LAW AND CRIMINOLOGY-CEZA HUKUKU VE KRIMINOLOJI DERGISI, 2020, 8 (02): : 293 - 319
  • [43] The Big Data Newsvendor: Practical Insights from Machine Learning
    Ban, Gah-Yi
    Rudin, Cynthia
    OPERATIONS RESEARCH, 2019, 67 (01) : 90 - 108
  • [44] Machine Learning for Knowledge Extraction from PHR Big Data
    Poulymenopoulou, Michaela
    Malamateniou, Flora
    Vassilacopoulos, George
    INTEGRATING INFORMATION TECHNOLOGY AND MANAGEMENT FOR QUALITY OF CARE, 2014, 202 : 36 - 39
  • [45] Smart Objects: An Active Big Data Approach
    Kaisler, Stephen H.
    Money, William
    Cohen, Stephen
    PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 809 - 818
  • [46] Data Analytics and Machine Learning: Navigating the Big Data Landscape
    Sloboda, Brian W.
    INTERNATIONAL STATISTICAL REVIEW, 2024,
  • [47] A REVIEW ON THE SIGNIFICANCE OF MACHINE LEARNING FOR DATA ANALYSIS IN BIG DATA
    Kolisetty, Vishnu Vandana
    Rajput, Dharmendra Singh
    JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2020, 6 (01): : 41 - 57
  • [48] The basics of data, big data, and machine learning in clinical practice
    Soriano-Valdez, David
    Pelaez-Ballestas, Ingris
    Manrique de Lara, Amaranta
    Gastelum-Strozzi, Alfonso
    CLINICAL RHEUMATOLOGY, 2021, 40 (01) : 11 - 23
  • [49] Data Science: Big Data, Machine Learning, and Artificial Intelligence
    Carlos, Ruth C.
    Kahn, Charles E.
    Halabi, Safwan
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (03) : 497 - 498
  • [50] Machine Learning and Big Data Implementation on Health Care data
    Sasubilli, Gopinadh
    Kumar, Abhishek
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 859 - 864