Protecting Machine Learning Models from Training Data Set Extraction

被引:0
|
作者
Kalinin, M. O. [1 ]
Muryleva, A. A. [1 ]
Platonov, V. V. [1 ]
机构
[1] Peter Great St Petersburg Polytech Univ, St Petersburg 195251, Russia
关键词
noising; machine learning; training set; membership inference; Gaussian noise; PRIVACY;
D O I
10.3103/S0146411624700871
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of protecting machine learning models from the threat of data privacy violation implementing membership inference in training data sets is considered. A method of protective noising of the training set is proposed. It is experimentally shown that Gaussian noising of training data with a scale of 0.2 is the simplest and most effective way to protect machine learning models from membership inference in the training set. In comparison with alternatives, this method is easy to implement, universal in relation to types of models, and allows reducing the effectiveness of membership inference to 26 percentage points.
引用
收藏
页码:1234 / 1241
页数:8
相关论文
共 50 条
  • [21] Generalization in quantum machine learning from few training data
    Caro, Matthias C.
    Huang, Hsin-Yuan
    Cerezo, M.
    Sharma, Kunal
    Sornborger, Andrew
    Cincio, Lukasz
    Coles, Patrick J.
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [22] Generalization in quantum machine learning from few training data
    Matthias C. Caro
    Hsin-Yuan Huang
    M. Cerezo
    Kunal Sharma
    Andrew Sornborger
    Lukasz Cincio
    Patrick J. Coles
    Nature Communications, 13
  • [23] REDIBAGG: Reducing the training set size in ensemble machine learning-based prediction models
    Silva-Ramirez, Esther-Lydia
    Cabrera-Sanchez, Juan-Francisco
    Lopez-Coello, Manuel
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [24] DATA EXTRACTION FROM SOUND WAVES TOWARDS NEURAL NETWORK TRAINING SET
    Volna, Eva
    Jarusek, Robert
    Kotyrba, Martin
    Janosek, Michal
    Kocian, Vaclav
    MENDEL 2011 - 17TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, 2011, : 177 - 184
  • [25] Statistics and machine learning methods for EHR data - from data extraction to data analytics
    Kundu, Madan G.
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2021, 31 (04) : 559 - 560
  • [26] Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models
    Conn, Jonathan G. M.
    Carter, James W.
    Conn, Justin J. A.
    Subramanian, Vigneshwari
    Baxter, Andrew
    Engkvist, Ola
    Llinas, Antonio
    Ratkova, Ekaterina L.
    Pickett, Stephen D.
    McDonagh, James L.
    Palmer, David S.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (04) : 1099 - 1113
  • [27] Training and Serving Machine Learning Models at Scale
    Baresi, Luciano
    Quattrocchi, Giovanni
    SERVICE-ORIENTED COMPUTING (ICSOC 2022), 2022, 13740 : 669 - 683
  • [28] QUBO formulations for training machine learning models
    Date, Prasanna
    Arthur, Davis
    Pusey-Nazzaro, Lauren
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [29] Tourism-Related Placeness Feature Extraction From Social Media Data Using Machine Learning Models
    Munoz, P.
    Donaque, E.
    Larranaga, A.
    Martinez, J.
    Mejias, A.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023, 8 (04): : 176 - 181
  • [30] Generating models of mental retardation from data with machine learning
    Mani, S
    McDermott, S
    Pazzani, MJ
    1997 IEEE KNOWLEDGE AND DATA ENGINEERING EXCHANGE WORKSHOP, PROCEEDINGS, 1997, : 114 - 119