Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

被引:0
|
作者
Soomro, Afzal Ahmed [1 ,2 ]
Mokhtar, Ainul Akmar [2 ]
Muhammad, Masdi B. [2 ]
Saad, Mohamad Hanif Md [1 ]
Lashari, Najeebullah [3 ,4 ]
Hussain, Muhammad [5 ]
Palli, Abdul Sattar [6 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Mech & Mfg Engn, Bangi 43600, Selangor, Malaysia
[2] Univ Teknol PETRONAS, Mech Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[3] Dawood Univ Engn & Technol, Petr & Gas Engn Dept, MA Jinnah Rd, Karachi 74800, Pakistan
[4] Univ Teknol PETRONAS, Petr Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[5] Univ Wollongong, Northfields Ave Wollongong, Wollongong, NSW 2522, Australia
[6] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak Darul, Malaysia
关键词
Burst pressure prediction; Machine learning; SMOTE; Data augmentation; Oil and gas pipelines; Safety; FAILURE PRESSURE; CORROSION DEFECTS; CHARGE; STEEL; STATE;
D O I
10.1016/j.rineng.2024.103233
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] An evaluation of machine learning and deep learning models for drought prediction using weather data
    Jiang, Weiwei
    Luo, Jiayun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3611 - 3626
  • [32] Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset
    Muhammad L.J.
    Algehyne E.A.
    Usman S.S.
    Ahmad A.
    Chakraborty C.
    Mohammed I.A.
    SN Computer Science, 2021, 2 (1)
  • [33] Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review
    Navarro, Constanza L. Andaur
    Damen, Johanna A. A.
    Takada, Toshihiko
    Nijman, Steven W. J.
    Dhiman, Paula
    Ma, Jie
    Collins, Gary S.
    Bajpai, Ram
    Riley, Richard D.
    Moons, Karel G. M.
    Hooft, Lotty
    BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
  • [34] Prediction of deep molecular response in chronic myeloid leukemia using supervised machine learning models
    Zad, Zahra
    Bonecker, Simone
    Wang, Taiyao
    Zalcberg, Ilana
    Stelzer, Gustavo T.
    Sabioni, Bruna
    Gutiyama, Luciana Mayumi
    Fleck, Julia L.
    Paschalidis, Ioannis Ch.
    LEUKEMIA RESEARCH, 2024, 141
  • [35] Synthetic Slowness Shear Well-Log Prediction Using Supervised Machine Learning Models
    Tamoto, Hugo
    Contreras, Rodrigo Colnago
    dos Santos, Franciso Lledo
    Viana, Monique Simplicio
    Gioria, Rafael dos Santos
    Carneiro, Cleyton de Carvalho
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I, 2023, 13588 : 115 - 130
  • [36] Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review
    Constanza L. Andaur Navarro
    Johanna A. A. Damen
    Toshihiko Takada
    Steven W. J. Nijman
    Paula Dhiman
    Jie Ma
    Gary S. Collins
    Ram Bajpai
    Richard D. Riley
    Karel G. M. Moons
    Lotty Hooft
    BMC Medical Research Methodology, 22
  • [37] Prediction of hydrocarbons ignition performances using machine learning modeling
    Flora, Giacomo
    Karimzadeh, Forood
    Kahandawala, Moshan S. P.
    Dewitt, Matthew J.
    Corporan, Edwin
    FUEL, 2024, 368
  • [39] Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms
    Zhao, Zhihong
    Bai, Tongyuan
    ENTROPY, 2022, 24 (08)
  • [40] Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
    Lee, Seoro
    Kim, Jonggun
    Lee, Gwanjae
    Hong, Jiyeong
    Bae, Joo Hyun
    Lim, Kyoung Jae
    SUSTAINABILITY, 2021, 13 (18)