Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

被引:0
|
作者
Soomro, Afzal Ahmed [1 ,2 ]
Mokhtar, Ainul Akmar [2 ]
Muhammad, Masdi B. [2 ]
Saad, Mohamad Hanif Md [1 ]
Lashari, Najeebullah [3 ,4 ]
Hussain, Muhammad [5 ]
Palli, Abdul Sattar [6 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Mech & Mfg Engn, Bangi 43600, Selangor, Malaysia
[2] Univ Teknol PETRONAS, Mech Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[3] Dawood Univ Engn & Technol, Petr & Gas Engn Dept, MA Jinnah Rd, Karachi 74800, Pakistan
[4] Univ Teknol PETRONAS, Petr Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[5] Univ Wollongong, Northfields Ave Wollongong, Wollongong, NSW 2522, Australia
[6] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak Darul, Malaysia
关键词
Burst pressure prediction; Machine learning; SMOTE; Data augmentation; Oil and gas pipelines; Safety; FAILURE PRESSURE; CORROSION DEFECTS; CHARGE; STEEL; STATE;
D O I
10.1016/j.rineng.2024.103233
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] The effect of Data Augmentation Using SMOTE: Diabetes Prediction by Machine Learning Techniques
    Al-Qerem, A.
    Ali, A. M.
    Alauthman, M.
    Al Khaldy, M.
    Aldweesh, A.
    PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 13 - 20
  • [2] Diabetes Prediction using SMOTE and Machine Learning
    Sarayu, Maganti Khyathi
    Bhanu, Shaik Ayesha
    Deekshitha, Karanam
    Meghana, Maduri
    Joseph, Iwin Thanakumar
    2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024, 2024, : 15 - 20
  • [3] Prediction of Graduate Admission using Multiple Supervised Machine Learning Models
    Bitar, Zain
    Al-Mousa, Amjed
    IEEE SOUTHEASTCON 2020, 2020,
  • [4] CARDIAC DISEASE PREDICTION USING SMOTE AND MACHINE LEARNING CLASSIFIERS
    Priyadarshinee, Sudipta
    Panda, Madhumita
    JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 856 - 862
  • [5] Data processing pipeline for cardiogenic shock prediction using machine learning
    Jajcay, Nikola
    Bezak, Branislav
    Segev, Amitai
    Matetzky, Shlomi
    Jankova, Jana
    Spartalis, Michael
    El Tahlawi, Mohammad
    Guerra, Federico
    Friebel, Julian
    Thevathasan, Tharusan
    Berta, Imrich
    Poelzl, Leo
    Naegele, Felix
    Pogran, Edita
    Cader, F. Aaysha
    Jarakovic, Milana
    Gollmann-Tepekoeylue, Can
    Kollarova, Marta
    Petrikova, Katarina
    Tica, Otilia
    Krychtiuk, Konstantin A.
    Tavazzi, Guido
    Skurk, Carsten
    Huber, Kurt
    Boehm, Allan
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2023, 10
  • [6] MBTI Personality Prediction Using Machine Learning and SMOTE for Balancing Data Based on Statement Sentences
    Ryan, Gregorius
    Katarina, Pricillia
    Suhartono, Derwin
    INFORMATION, 2023, 14 (04)
  • [7] Accurate prediction of pressure losses using machine learning for the pipeline transportation of emulsions
    Hafsa, Noor
    Rushd, Sayeed
    Alzoubi, Hadeel
    Al-Faiad, Majdi
    HELIYON, 2024, 10 (01)
  • [8] Ecg Classification using Machine Learning Techniques and Smote Oversampling Technique
    Zhong, Zhang Xing
    Michael, Akotonou J.
    Lun, Zhao Jie
    Yue, Dong Hong
    PROCEEDINGS OF 2020 2ND INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION AND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND MACHINE LEARNING, IPMV 2020, 2020, : 10 - 13
  • [9] Drug classification based on Machine learning models with a combination of Data binning and SMOTE technique
    Tran Anh Vu
    Tran Minh Hieu
    Hoang Thi Mai Linh
    Hoang Quang Huy
    Pham Thi Viet Huong
    2023 1ST INTERNATIONAL CONFERENCE ON HEALTH SCIENCE AND TECHNOLOGY, ICHST 2023, 2023,
  • [10] Enhanced prediction of anisotropic deformation behavior using machine learning with data augmentation
    Sujeong Byun
    Jinyeong Yu
    Seho Cheon
    Seong Ho Lee
    Sung Hyuk Park
    Taekyung Lee
    Journal of Magnesium and Alloys, 2024, 12 (01) : 186 - 196