Applying statistical thinking to 'Big Data' problems

被引:27
|
作者
Hoerl, Roger W. [1 ]
Snee, Ronald D. [2 ]
De Veaux, Richard D. [3 ]
机构
[1] Union Coll, Dept Math, Schenectady, NY 12308 USA
[2] Snee Associates, Newark, DE USA
[3] Williams Coll, Dept Math & Stat, Williamstown, MA 01267 USA
关键词
data mining; statistical engineering; analytics; machine learning;
D O I
10.1002/wics.1306
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Much has been written recently about 'Big Data' and the new possibilities that mining this vast amount of data brings. It promises to help us understand or predict everything from the Higgs boson to what a customer might purchase next from Amazon. As with most new phenomena, it is hard to sift through the hype and promotion to understand what is actually true and what is actually useful. One implicit or even explicitly stated assumption in much of the Big Data literature is that statistical thinking fundamentals are no longer relevant in the petabyte age. However, we believe just the opposite. Fundamentals of good modeling and statistical thinking are crucial for the success of Big Data projects. Sound statistical practices, such as ensuring high-quality data, incorporating sound domain (subject matter) knowledge, and developing an overall strategy or plan of attack for large modeling problems, are even more important for Big Data problems than small data problems. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:222 / 232
页数:11
相关论文
共 50 条
  • [21] Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities
    Ozgur, Ceyhun
    Kleckner, Michelle
    Li, Yang
    SAGE OPEN, 2015, 5 (02):
  • [22] Big Data, Big Problems: A Healthcare Perspective
    Househ, Mowafa S.
    Aldosari, Bakheet
    Alanazi, Abdullah
    Kushniruk, Andre W.
    Borycki, Elizabeth M.
    INFORMATICS EMPOWERS HEALTHCARE TRANSFORMATION, 2017, 238 : 36 - 39
  • [23] Big data, small airways, big problems
    Aziz, M.
    BRITISH JOURNAL OF ANAESTHESIA, 2017, 119 (05) : 864 - 866
  • [24] Applying big data to childhood vaccination in Africa
    Bidmos, Fadil
    LANCET INFECTIOUS DISEASES, 2022, 22 (05): : 585 - 585
  • [25] The role of design thinking in Big Data innovations
    Pham, Cristina Tu Anh
    Magistretti, Stefano
    Dell'Era, Claudio
    INNOVATION-ORGANIZATION & MANAGEMENT, 2022, 24 (02): : 290 - 314
  • [26] Big Data: Promises and Problems
    Gudivada, Venkat N.
    Baeza-Yates, Ricardo
    Raghavan, Vijay V.
    COMPUTER, 2015, 48 (03) : 20 - 23
  • [27] Computational thinking in the era of big data biology
    Schatz, Michael C.
    GENOME BIOLOGY, 2012, 13 (11)
  • [28] Computational thinking in the era of big data biology
    Michael C Schatz
    Genome Biology, 13
  • [29] The statistical analysis in the era of big data
    Wang, Zelin
    Liu, Xinke
    Zhang, Weiye
    Zhi, Yingying
    Cheng, Shi
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2022, 40 (02) : 151 - 157
  • [30] Statistical methods and computing for big data
    Wang, Chun
    Chen, Ming-Hui
    Schifano, Elizabeth
    Wu, Jing
    Yan, Jun
    STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 399 - 414