Applying statistical thinking to 'Big Data' problems

被引:27
|
作者
Hoerl, Roger W. [1 ]
Snee, Ronald D. [2 ]
De Veaux, Richard D. [3 ]
机构
[1] Union Coll, Dept Math, Schenectady, NY 12308 USA
[2] Snee Associates, Newark, DE USA
[3] Williams Coll, Dept Math & Stat, Williamstown, MA 01267 USA
关键词
data mining; statistical engineering; analytics; machine learning;
D O I
10.1002/wics.1306
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Much has been written recently about 'Big Data' and the new possibilities that mining this vast amount of data brings. It promises to help us understand or predict everything from the Higgs boson to what a customer might purchase next from Amazon. As with most new phenomena, it is hard to sift through the hype and promotion to understand what is actually true and what is actually useful. One implicit or even explicitly stated assumption in much of the Big Data literature is that statistical thinking fundamentals are no longer relevant in the petabyte age. However, we believe just the opposite. Fundamentals of good modeling and statistical thinking are crucial for the success of Big Data projects. Sound statistical practices, such as ensuring high-quality data, incorporating sound domain (subject matter) knowledge, and developing an overall strategy or plan of attack for large modeling problems, are even more important for Big Data problems than small data problems. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:222 / 232
页数:11
相关论文
共 50 条
  • [31] Statistical education in times of Big Data
    Zwick M.
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2016, 10 (2-3) : 127 - 139
  • [32] Statistical learning and big data applications
    Witte, Harald
    Blatter, Tobias U. U.
    Nagabhushana, Priyanka
    Schaer, David
    Ackermann, James
    Cadamuro, Janne
    Leichtle, Alexander B. B.
    JOURNAL OF LABORATORY MEDICINE, 2023, 47 (04) : 181 - 186
  • [33] Statistical analysis of big data on pharmacogenomics
    Fan, Jianqing
    Liu, Han
    ADVANCED DRUG DELIVERY REVIEWS, 2013, 65 (07) : 987 - 1000
  • [34] TO QUESTION OF THE STATISTICAL ANALYSIS OF BIG DATA
    Lemeshko, B. Yu
    Lemeshko, S. B.
    Semenova, M. A.
    VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-UPRAVLENIE VYCHISLITELNAJA TEHNIKA I INFORMATIKA-TOMSK STATE UNIVERSITY JOURNAL OF CONTROL AND COMPUTER SCIENCE, 2018, (44): : 40 - 49
  • [35] Statistical learning for big dependent data
    Dolores Ugarte, Maria
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2022, 185 (03) : 1460 - 1460
  • [36] Statistical model selection with "Big Data"
    Doornik, Jurgen A.
    Hendry, David F.
    COGENT ECONOMICS & FINANCE, 2015, 3 (01):
  • [37] Big data: Some statistical issues
    Cox, D. R.
    Kartsonaki, Christiana
    Keogh, Ruth H.
    STATISTICS & PROBABILITY LETTERS, 2018, 136 : 111 - 115
  • [38] Statistical science in the world of big data
    Reid, Nancy
    STATISTICS & PROBABILITY LETTERS, 2018, 136 : 42 - 45
  • [39] Statistical Modelling for Big and Little Data
    Henderson, Robin
    DEVELOPMENTS IN STATISTICAL MODELLING, IWSM 2024, 2024, : 246 - 254
  • [40] Big data. Big potential. Big problems?
    West, Stephen W.
    Clubb, Jo
    Blake, Tracy A.
    Fern, James
    Bowles, Harry
    Dalen-Lorentsen, Torstein
    BMJ OPEN SPORT & EXERCISE MEDICINE, 2024, 10 (02):