Ensemble Random Forests as a tool for modeling rare occurrences

被引:16
|
作者
Siders, Zachary A. [1 ]
Ducharme-Barth, Nicholas D. [2 ]
Carvalha, Felipe [3 ]
Kobayashi, Donald [3 ]
Martin, Summer [3 ]
Raynor, Jennifer [4 ]
Jones, T. Todd [3 ]
Ahrens, Robert N. M. [3 ]
机构
[1] Univ Florida, UF IFAS SFRC Fisheries & Aquat Sci Program, Gainesville, FL 32611 USA
[2] Pacific Community, Ocean Fisheries Programme, Noumea 98800, New Caledonia
[3] NOAA Fisheries, Pacific Isl Fisheries Sci Ctr, Honolulu, HI 96818 USA
[4] Wesleyan Univ, Dept Econ, Middletown, CT 06457 USA
关键词
Rare event bias; Species distribution modeling; Protected species; Bycatch; Machine learning; Random Forest; SPECIES DISTRIBUTION MODELS; CLASSIFIER; SPACE;
D O I
10.3354/esr01060
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs' perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.
引用
收藏
页码:183 / 197
页数:15
相关论文
共 50 条
  • [1] Random Forests with ensemble of feature spaces
    Zhang, Le
    Suganthan, Ponnuthurai Nagaratnam
    PATTERN RECOGNITION, 2014, 47 (10) : 3429 - 3437
  • [2] Ensemble methods: bagging and random forests
    Naomi Altman
    Martin Krzywinski
    Nature Methods, 2017, 14 : 933 - 934
  • [3] POINTS OF SIGNIFICANCE Ensemble methods: bagging and random forests
    Altman, Naomi
    Krzywinski, Martin
    NATURE METHODS, 2017, 14 (10) : 933 - 934
  • [4] Cluster ensemble based on Random Forests for genetic data
    Luluah Alhusain
    Alaaeldin M. Hafez
    BioData Mining, 10
  • [5] Cluster ensemble based on Random Forests for genetic data
    Alhusain, Luluah
    Hafez, Alaaeldin M.
    BIODATA MINING, 2017, 10
  • [6] Ensemble Acoustic Modeling for CD-DNN-HMM Using Random Forests of Phonetic Decision Trees
    Zhao, Tuo
    Zhao, Yunxin
    Chen, Xin
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 187 - 196
  • [7] Ensemble Acoustic Modeling for CD-DNN-HMM Using Random Forests of Phonetic Decision Trees
    Tuo Zhao
    Yunxin Zhao
    Xin Chen
    Journal of Signal Processing Systems, 2016, 82 : 187 - 196
  • [8] Language modeling experiments with random forests
    Jelinek, F
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 1 - 1
  • [9] Random forests as a tool for ecohydrological distribution modelling
    Peters, Jan
    De Baets, Bernard
    Verhoest, Niko E. C.
    Samson, Roeland
    Degroeve, Sven
    De Becker, Piet
    Huybrechts, Willy
    ECOLOGICAL MODELLING, 2007, 207 (2-4) : 304 - 318
  • [10] Using random forests to forecast daily extreme sea level occurrences at the Baltic Coast
    Bellinghausen, Kai
    Huenicke, Birgit
    Zorita, Eduardo
    NATURAL HAZARDS AND EARTH SYSTEM SCIENCES, 2025, 25 (03) : 1139 - 1162