Data-intensive Science: A New Paradigm for Biodiversity Studies

被引:234
作者
Kelling, Steve [1 ]
Hochachka, Wesley M. [1 ]
Fink, Daniel [1 ]
Riedewald, Mirek [3 ]
Caruana, Rich [4 ]
Ballard, Grant [5 ]
Hooker, Giles [2 ]
机构
[1] Cornell Univ, Cornell Lab Ornithol, Ithaca, NY 14853 USA
[2] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY USA
[3] Northeastern Univ, Coll Comp & Informat Sci, Boston, MA 02115 USA
[4] Microsoft Corp, Redmond, WA 98052 USA
[5] PRBO Conservat Sci, Petaluma, CA USA
基金
美国国家科学基金会;
关键词
data-intensive science; informatics; biodiversity; machine learning; statistics; ECOLOGICAL DATA;
D O I
10.1525/bio.2009.59.7.12
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that are otherwise not apparent. For biodiversity studies, a "data-driven" approach is necessary because of the complexity of ecological systems, particularly when viewed at large spatial and temporal scales. Data-intensive science organizes large volumes of data from multiple sources and fields and then analyzes them using techniques tailored to the discovery of complex patterns in high-dimensional data through visualizations, simulations, and various types of model building. Through interpreting and analyzing these models, truly novel and surprising patterns that are "born from the data" can be discovered. These patterns provide valuable insight for concrete hypotheses about the underlying ecological processes that created the observed data. Data-intensive science allows scientists to analyze bigger and more complex systems efficiently, and complements more traditional scientific processes of hypothesis generation and experimental testing to refine our understanding of the natural world.
引用
收藏
页码:613 / 620
页数:8
相关论文
共 40 条
[1]  
ANDERSON C, 2008, END STORY DATA DELUG
[2]  
ANGEVAARE I, 2008, ANN C ALL PERM ACC B
[3]  
[Anonymous], 1995, Macroecology
[4]  
[Anonymous], 1993, INTRO BOOTSTRAP
[5]  
Caruana R., 2006, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P909
[6]  
DELCOURT H, 2005, ISSUES PERSPECTIVES, P159
[7]  
DOAK D, 1992, THEORETICAL POPULATI, V41, P21
[8]   Novel methods improve prediction of species' distributions from occurrence data [J].
Elith, J ;
Graham, CH ;
Anderson, RP ;
Dudík, M ;
Ferrier, S ;
Guisan, A ;
Hijmans, RJ ;
Huettmann, F ;
Leathwick, JR ;
Lehmann, A ;
Li, J ;
Lohmann, LG ;
Loiselle, BA ;
Manion, G ;
Moritz, C ;
Nakamura, M ;
Nakazawa, Y ;
Overton, JM ;
Peterson, AT ;
Phillips, SJ ;
Richardson, K ;
Scachetti-Pereira, R ;
Schapire, RE ;
Soberón, J ;
Williams, S ;
Wisz, MS ;
Zimmermann, NE .
ECOGRAPHY, 2006, 29 (02) :129-151
[9]  
Fink D, 2009, ENVIRON ECOL STAT SE, V3, P1011, DOI 10.1007/978-0-387-78151-8_46
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232