A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Chen, Kai [1 ,2 ]
Han, Guangjie [3 ]
Huang, Shuqiang [4 ]
Yuan, Weiwei [1 ,2 ]
Guizani, Mohsen [5 ]
Shu, Lei [6 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Peoples R China
[3] Dalian Univ Technol, Sch Software, Key Lab Ubiquitous Network & Serv Software Liaoni, Dalian 116024, Peoples R China
[4] Jinan Univ, Coll Sci & Engn, Dept Optoelect Engn, Guangzhou 510632, Peoples R China
[5] Qatar Univ, Coll Engn, Doha 2713, Qatar
[6] Nanjing Agr Univ, Coll Engn, Nanjing 210095, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Informatics; Training; Machine learning; Task analysis; Noise measurement; Reliability; High dimension; industrial informatics; noise filtering; OUTLIER DETECTION; CLASSIFICATION; SELECTION; QUALITY;
D O I
10.1109/TII.2020.3012658
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.
引用
收藏
页码:2181 / 2190
页数:10
相关论文
共 50 条
  • [21] Nasseh method to visualize high-dimensional data
    Chaffi, Babak Nasseh
    Tafreshi, Fakhteh Soltani
    APPLIED SOFT COMPUTING, 2019, 84
  • [22] An Initialization Method for Clustering High-Dimensional Data
    Chen, Luying
    Chen, Lifei
    Jiang, Qingshan
    Wang, Beizhan
    Shi, Liang
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 444 - +
  • [23] An Ensemble Method for High-Dimensional Multilabel Data
    Liu, Huawen
    Zheng, Zhonglong
    Zhao, Jianmin
    Ye, Ronghua
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
  • [24] Ensemble Method for Classification of High-Dimensional Data
    Piao, Yongjun
    Park, Hyun Woo
    Jin, Cheng Hao
    Ryu, Keun Ho
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 245 - +
  • [25] A Novel Convex Clustering Method for High-Dimensional Data Using Semiproximal ADMM
    Chen, Huangyue
    Kong, Lingchen
    Li, Yan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [26] A Method for Measurement Data Modeling and High-Dimensional Outlier Detection Based on Large Dimensional Matrix
    Chen, Gang
    Fan, Huanhuan
    An, Baoran
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2274 - 2279
  • [27] ON ESTIMATION OF THE NOISE VARIANCE IN A HIGH-DIMENSIONAL SIGNAL DETECTION MODEL
    Yao, Jianfeng
    Passemier, Damien
    2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 17 - 20
  • [28] An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection
    Zhang, Liangwei
    Lin, Jing
    Karim, Ramin
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2015, 142 : 482 - 497
  • [29] Selective Feature Bagging of one-class classifiers for novelty detection in high-dimensional data
    Wang, Biao
    Wang, Wenjing
    Meng, Guanglei
    Meng, Tiankuo
    Song, Bin
    Wang, Yingnan
    Guo, Yuming
    Qiao, Zhihua
    Mao, Zhizhong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
  • [30] A structure noise-aware tensor dictionary learning method for high-dimensional data clustering
    Yang, Jing-Hua
    Chen, Chuan
    Dai, Hong-Ning
    Fu, Le-Le
    Zheng, Zibin
    Information Sciences, 2022, 612 : 87 - 106