Applying statistical thinking to 'Big Data' problems

被引:27
|
作者
Hoerl, Roger W. [1 ]
Snee, Ronald D. [2 ]
De Veaux, Richard D. [3 ]
机构
[1] Union Coll, Dept Math, Schenectady, NY 12308 USA
[2] Snee Associates, Newark, DE USA
[3] Williams Coll, Dept Math & Stat, Williamstown, MA 01267 USA
关键词
data mining; statistical engineering; analytics; machine learning;
D O I
10.1002/wics.1306
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Much has been written recently about 'Big Data' and the new possibilities that mining this vast amount of data brings. It promises to help us understand or predict everything from the Higgs boson to what a customer might purchase next from Amazon. As with most new phenomena, it is hard to sift through the hype and promotion to understand what is actually true and what is actually useful. One implicit or even explicitly stated assumption in much of the Big Data literature is that statistical thinking fundamentals are no longer relevant in the petabyte age. However, we believe just the opposite. Fundamentals of good modeling and statistical thinking are crucial for the success of Big Data projects. Sound statistical practices, such as ensuring high-quality data, incorporating sound domain (subject matter) knowledge, and developing an overall strategy or plan of attack for large modeling problems, are even more important for Big Data problems than small data problems. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:222 / 232
页数:11
相关论文
共 50 条
  • [41] Applying Combinatorial Test Data Generation to Big Data Applications
    Li, Nan
    Lei, Yu
    Khan, Haider Riaz
    Liu, Jingshu
    Guo, Yun
    2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, : 637 - 647
  • [42] Universal Health Coverage-Big Thinking versus Big Data
    Garrison, Louis P., Jr.
    VALUE IN HEALTH, 2013, 16 (01) : S1 - S3
  • [43] Privacy Gaps for Digital Cardiology Data Big Problems With Big Data
    Golbus, Jessica R.
    Price, W. Nicholson, II
    Nallamothu, Brahmajee K.
    CIRCULATION, 2020, 141 (08) : 613 - 615
  • [44] Applying E/hf systems thinking to complex global problems
    Thatcher, A.
    Yeow, P. H. P.
    Sigahi, T. F. A. C. S.
    Salmon, P. M.
    ERGONOMICS, 2024, 67 (04) : 447 - 449
  • [45] Design It! Solving Sustain ability Problems by Applying Design Thinking
    Fischer, Matthias
    GAIA-ECOLOGICAL PERSPECTIVES FOR SCIENCE AND SOCIETY, 2015, 24 (03): : 174 - 178
  • [46] Data Visualization and Statistical Literacy for Open and Big Data
    Shanmugam, Ramalingam
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020,
  • [47] Data Visualization and Statistical Graphics in Big Data Analysis
    Cook, Dianne
    Lee, Eun-Kyung
    Majumder, Mahbubul
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 3, 2016, 3 : 133 - 159
  • [48] PROBLEMS IN APPLYING STATISTICAL-METHODS IN THERAPEUTIC STUDIES
    WOLF, GK
    FORTSCHRITTE DER MEDIZIN, 1981, 99 (21) : 803 - &
  • [49] SOME PROBLEMS IN APPLYING STATISTICAL PROCESS CONTROL.
    Shaw, P.
    Dale, B.G.
    Quality assurance London, 1987, 13 (01): : 14 - 17
  • [50] Statistical analysis of big data: An approach based on support vector machines for classification and regression problems
    Kadyrova N.O.
    Pavlova L.V.
    Biophysics, 2014, 59 (3) : 364 - 373