Ensemble Random Forests as a tool for modeling rare occurrences

被引:16
|
作者
Siders, Zachary A. [1 ]
Ducharme-Barth, Nicholas D. [2 ]
Carvalha, Felipe [3 ]
Kobayashi, Donald [3 ]
Martin, Summer [3 ]
Raynor, Jennifer [4 ]
Jones, T. Todd [3 ]
Ahrens, Robert N. M. [3 ]
机构
[1] Univ Florida, UF IFAS SFRC Fisheries & Aquat Sci Program, Gainesville, FL 32611 USA
[2] Pacific Community, Ocean Fisheries Programme, Noumea 98800, New Caledonia
[3] NOAA Fisheries, Pacific Isl Fisheries Sci Ctr, Honolulu, HI 96818 USA
[4] Wesleyan Univ, Dept Econ, Middletown, CT 06457 USA
关键词
Rare event bias; Species distribution modeling; Protected species; Bycatch; Machine learning; Random Forest; SPECIES DISTRIBUTION MODELS; CLASSIFIER; SPACE;
D O I
10.3354/esr01060
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs' perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.
引用
收藏
页码:183 / 197
页数:15
相关论文
共 50 条
  • [21] Adversarial Random Forests for Density Estimation and Generative Modeling
    Watson, David S.
    Blesch, Kristin
    Kapar, Jan
    Wright, Marvin N.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [22] Modeling of photovoltaic array using random forests technique
    Ibrahim, Ibrahim A.
    Mohamed, Azah
    Khatib, Tamer
    2015 IEEE CONFERENCE ON ENERGY CONVERSION (CENCON), 2015, : 390 - 393
  • [23] The use of Random Forests for modeling in vitro ADMET endpoints
    D Hughes, J
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 230 : U1013 - U1014
  • [24] Noncontact Sleep Study Based on an Ensemble of Deep Neural Network and Random Forests
    Chung, Ku-Young
    Song, Kwangsub
    Cho, Seok Hyun
    Chang, Joon-Hyuk
    IEEE SENSORS JOURNAL, 2018, 18 (17) : 7315 - 7324
  • [25] Ensemble of Bidirectional Recurrent Networks and Random Forests for Protein Secondary Structure Prediction
    de Oliveira, Gabriel Bianchin
    Pedrini, Helio
    Dias, Zanoni
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP), 27TH EDITION, 2020, : 311 - 316
  • [26] Ensemble of Random and Isolation Forests for Graph-Based Intrusion Detection in Containers
    Iacovazzi, Alfonso
    Raza, Shahid
    2022 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2022, : 30 - 37
  • [27] ENSEMBLE DIVERSITY ANALYSIS ON REMOTE SENSING DATA CLASSIFICATION USING RANDOM FORESTS
    Boukir, Samia
    Mellor, Andrew
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1302 - 1306
  • [28] An operational coastal forecasting tool for performing ensemble modeling
    Taeb, Peyman
    Weaver, Robert J.
    ESTUARINE COASTAL AND SHELF SCIENCE, 2019, 217 : 237 - 249
  • [29] Stochastic modeling of earthquake occurrences and estimation of seismic hazard:: a random field approach
    Akkaya, AD
    Yücemen, MS
    PROBABILISTIC ENGINEERING MECHANICS, 2002, 17 (01) : 1 - 13
  • [30] On pattern occurrences in a random text
    Fudos, I
    Pitoura, E
    Szpankowski, W
    INFORMATION PROCESSING LETTERS, 1996, 57 (06) : 307 - 312