Applying statistical thinking to 'Big Data' problems

被引:27
|
作者
Hoerl, Roger W. [1 ]
Snee, Ronald D. [2 ]
De Veaux, Richard D. [3 ]
机构
[1] Union Coll, Dept Math, Schenectady, NY 12308 USA
[2] Snee Associates, Newark, DE USA
[3] Williams Coll, Dept Math & Stat, Williamstown, MA 01267 USA
关键词
data mining; statistical engineering; analytics; machine learning;
D O I
10.1002/wics.1306
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Much has been written recently about 'Big Data' and the new possibilities that mining this vast amount of data brings. It promises to help us understand or predict everything from the Higgs boson to what a customer might purchase next from Amazon. As with most new phenomena, it is hard to sift through the hype and promotion to understand what is actually true and what is actually useful. One implicit or even explicitly stated assumption in much of the Big Data literature is that statistical thinking fundamentals are no longer relevant in the petabyte age. However, we believe just the opposite. Fundamentals of good modeling and statistical thinking are crucial for the success of Big Data projects. Sound statistical practices, such as ensuring high-quality data, incorporating sound domain (subject matter) knowledge, and developing an overall strategy or plan of attack for large modeling problems, are even more important for Big Data problems than small data problems. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:222 / 232
页数:11
相关论文
共 50 条
  • [1] Applying big data beyond small problems in climate research
    Knusel, Benedikt
    Zumwald, Marius
    Baumberger, Christoph
    Hadorn, Gertrude Hirsch
    Fischer, Erich M.
    Bresch, David N.
    Knutti, Reto
    NATURE CLIMATE CHANGE, 2019, 9 (03) : 196 - 202
  • [2] Applying big data beyond small problems in climate research
    Benedikt Knüsel
    Marius Zumwald
    Christoph Baumberger
    Gertrude Hirsch Hadorn
    Erich M. Fischer
    David N. Bresch
    Reto Knutti
    Nature Climate Change, 2019, 9 : 196 - 202
  • [3] BIG PROBLEMS OF THINKING SMALL
    MURRAY, TJ
    DUNS REVIEW, 1976, 107 (02): : 70 - +
  • [4] Integration of macro energy thinking and big data thinking part one big data and power big data
    Xue Y.
    Lai Y.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2016, 40 (01): : 1 - 8
  • [5] Computational Thinking, Inferential Thinking and "Big Data"
    Jordan, Michael I.
    PODS'15: PROCEEDINGS OF THE 33RD ACM SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2015, : 1 - 1
  • [6] Chemical Screening: Thinking Big with Big Data
    Lushington, Gerald Henry
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2014, 17 (06) : 483 - 484
  • [7] Applying data models to big data architectures
    O'Sullivan, P.
    Thompson, G.
    Clifford, A.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2014, 58 (5-6) : 5 - 6
  • [8] FuturICT and social Sciences: Big Data, Big Thinking
    Conte, Rosaria
    Gilbert, Nigel
    Bonelli, Giulia
    Helbing, Dirk
    ZEITSCHRIFT FUR SOZIOLOGIE, 2011, 40 (05): : 412 - 413
  • [9] Some Big Problems with Big Data
    Gailey, Amanda
    AMERICAN PERIODICALS, 2016, 26 (01): : 22 - +
  • [10] Applying Big Data to Pediatric Care
    Hsu, Benson S.
    Smith, Justin P.
    Griese, Emily R.
    PEDIATRICS IN REVIEW, 2019, 40 (07) : 372 - 374