A Bayesian perspective of statistical machine learning for big data

被引:18
|
作者
Sambasivan, Rajiv [1 ,2 ]
Das, Sourish [1 ,2 ]
Sahu, Sujit K. [1 ,2 ]
机构
[1] Chennai Math Inst, Chennai, Tamil Nadu, India
[2] Univ Southampton, Southampton, Hants, England
关键词
Bayesian methods; Big data; Machine learning; Statistical learning; REGRESSION; OPTIMIZATION; SELECTION; INFERENCE; MODEL;
D O I
10.1007/s00180-020-00970-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword 'learning' in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view-where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.
引用
收藏
页码:893 / 930
页数:38
相关论文
共 50 条
  • [1] A Bayesian perspective of statistical machine learning for big data
    Rajiv Sambasivan
    Sourish Das
    Sujit K. Sahu
    Computational Statistics, 2020, 35 : 893 - 930
  • [2] Bayesian statistical learning for big data biology
    Yau C.
    Campbell K.
    Biophysical Reviews, 2019, 11 (1) : 95 - 102
  • [3] How Big Data changes Statistical Machine Learning
    Bottou, Leon
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1 - 1
  • [4] Financial Big data Visualization: A Machine Learning Perspective
    Dong, Alice Xiaodan
    Huang, Weidong
    Wang, Jitong
    17TH INTERNATIONAL SYMPOSIUM ON VISUAL INFORMATION COMMUNICATION AND INTERACTION, VINCI 2024, 2024,
  • [5] Approximate Bayesian Computation for Machine Learning, Inverse problems and Big Data
    Mohammad-Djafari, Ali
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING (MAXENT 2016), 2017, 1853
  • [6] Machine Learning and Big Data Processing: A Technological Perspective and Review
    Bhatnagar, Roheet
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 468 - 478
  • [7] Gene expression data analysis: a statistical and machine learning perspective
    Chattopadhyay, Amrita
    BIOMETRICS, 2023, 79 (01) : 526 - 528
  • [8] Editorial to the special issue: Statistical Approaches for Big Data and Machine Learning
    Zhao, Yichuan
    Chen, Chi-Hua
    Feng, Feng
    Pamucar, Dragan
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 451 - 455
  • [9] A Survey of Bayesian Statistical Approaches for Big Data
    Jahan, Farzana
    Ullah, Insha
    Mengersen, Kerrie L.
    CASE STUDIES IN APPLIED BAYESIAN DATA SCIENCE: CIRM JEAN-MORLET CHAIR, FALL 2018, 2020, 2259 : 17 - 44
  • [10] Data Mining, Machine Learning, and Statistical Modeling for Predictive Analytics with Behavioral Big Data
    Arunkumar, M.
    Rajkumar, K.
    Jeyaseelan, W. r. salem
    Natraj, N. A.
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2025, 32 (01): : 72 - 77