Random Forests lithology prediction method for imbalanced data sets

被引:0
|
作者
Wang G. [1 ]
Song J. [1 ]
Xu F. [1 ]
Zhang W. [2 ]
Liu J. [3 ]
Chen F. [4 ]
机构
[1] School of Geosciences, China University of Petroleum (East China), Qingdao
[2] School of Earth and Space Sciences, University of Science and Technology of China, Hefei
[3] SINOPEC Petroleum Exploration and Production Research Institute, Beijing
[4] Research Institute of Petroleum Exploration and Development, PetroChina Tarim Oilfield Company, Korla
关键词
Class balancing techniques; Imbalanced data sets; Lithology prediction; Machine learning; Random Forests classification;
D O I
10.13810/j.cnki.issn.1000-7210.2021.04.001
中图分类号
学科分类号
摘要
For the lithology prediction method depending on a supervised machine learning classifier, if the data set has too few samples of target lithology while too many samples of non-target lithology, the classifier trained on this imbalanced data set will cause the prediction results be biased toward the non-target lithology, resulting in poor prediction accuracy of target lithology. With regard to this problem, a Random Forests lithology prediction method for imbalanced data sets is proposed. Firstly, a lithology data set is constructed with lithological logging data as sample labels and seismic attributes and elastic parameters of rock at the uphole trace as sample features. Secondly, the NM-SMOTE algorithm integrating near miss (NM) and synthetic minority over-sampling technique (SMOTE) is employed to balance the lithology data set. Then a Random Forests classifier is trained on the balanced data set to build a nonlinear relationship of lithology with various seismic attributes and elastic parameters. Finally, the seismic attributes and elastic parameters of the target explorato-ry area are input into the Random Forests classifier which will predict lithology according to the above nonlinear relationship obtained during training. The actual data test results demonstrate that too many samples of non-target lithology will affect the prediction accuracy of the Random Forests classifier, and the prediction accuracy of lithology is only 38%. After the training data set is balanced with the NM-SMOTE algorithm, the prediction accuracy of lithology rises up to 83%, and a data volume of lithology is obtained, which is more consistent with seismic data. © 2021, Editorial Department OIL GEOPHYSICAL PROSPECTING. All right reserved.
引用
收藏
页码:679 / 687
页数:8
相关论文
共 23 条
  • [11] ZHANG Guoyin, WANG Zhizhang, LIN Chengyan, Et al., Seismic reservoir prediction method based on wavelet transform and convolutional neural network and its application, Journal of China University of Petroleum (Edition of Natural Science), 44, 4, pp. 83-93, (2020)
  • [12] (2017)
  • [13] YU Hualong, Class Imbalanced Learning:Theories and Algorithms, (2017)
  • [14] Zhang J, Mani I., KNN approach to unbalanced data distributions:A case study involving information extraction, Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, (2003)
  • [15] Chawla N V, Bowyer K W, Hall L O, Et al., SMOTE:Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)
  • [16] Breiman L., Random forests[J], Machine Learning, 45, 1, pp. 5-32, (2001)
  • [17] Efron B, Tibshirani R J., An Introduction to the Bootstrap, (1994)
  • [18] Kohavi R., A study of cross-validation and bootstrap for accuracy estimation and model selection, International Joint Conference on Artificial Intel-ligence, (1995)
  • [19] Mosley L., A Balanced Approach to the Multi-class Imbalance Problem, (2013)
  • [20] Barnes A E., Handbook of Poststack Seismic Attri-butes, (2016)