A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Chen, Kai [1 ,2 ]
Han, Guangjie [3 ]
Huang, Shuqiang [4 ]
Yuan, Weiwei [1 ,2 ]
Guizani, Mohsen [5 ]
Shu, Lei [6 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Peoples R China
[3] Dalian Univ Technol, Sch Software, Key Lab Ubiquitous Network & Serv Software Liaoni, Dalian 116024, Peoples R China
[4] Jinan Univ, Coll Sci & Engn, Dept Optoelect Engn, Guangzhou 510632, Peoples R China
[5] Qatar Univ, Coll Engn, Doha 2713, Qatar
[6] Nanjing Agr Univ, Coll Engn, Nanjing 210095, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Informatics; Training; Machine learning; Task analysis; Noise measurement; Reliability; High dimension; industrial informatics; noise filtering; OUTLIER DETECTION; CLASSIFICATION; SELECTION; QUALITY;
D O I
10.1109/TII.2020.3012658
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.
引用
收藏
页码:2181 / 2190
页数:10
相关论文
共 50 条
  • [31] A structure noise-aware tensor dictionary learning method for high-dimensional data clustering
    Yang, Jing-Hua
    Chen, Chuan
    Dai, Hong-Ning
    Fu, Le-Le
    Zheng, Zibin
    INFORMATION SCIENCES, 2022, 612 : 87 - 106
  • [32] Interaction Detection with Random Forests in High-Dimensional Data
    Winham, Stacey
    Wang, Xin
    de Andrade, Mariza
    Freimuth, Robert
    Colby, Colin
    Huebner, Marianne
    Biernacka, Joanna
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 142 - 142
  • [33] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [34] DACC: A Data Exploration Method for High-Dimensional Data Sets
    Zhao, Qingnan
    Li, Hui
    Chen, Mei
    Dai, Zhenyu
    Zhu, Ming
    ARTIFICIAL INTELLIGENCE AND ALGORITHMS IN INTELLIGENT SYSTEMS, 2019, 764 : 219 - 229
  • [35] A NOVEL TENSOR ALGEBRAIC APPROACH FOR HIGH-DIMENSIONAL OUTLIER DETECTION UNDER DATA MISALIGNMENT
    Fan, Bo
    Zhang, Zemin
    Aeron, Shuchin
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3628 - 3632
  • [36] Multiple change point detection for high-dimensional data
    Zhao, Wenbiao
    Zhu, Lixing
    Tan, Falong
    TEST, 2024, 33 (03) : 809 - 846
  • [37] On the orthogonal distance to class subspaces for high-dimensional data classification
    Zhu, Rui
    Xue, Jing-Hao
    INFORMATION SCIENCES, 2017, 417 : 262 - 273
  • [38] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [39] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [40] Industrial Data Modeling With Low-Dimensional Inputs and High-Dimensional Outputs
    Tang, Jiawei
    Lin, Xiaowen
    Zhao, Fei
    Chen, Xi
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (01) : 835 - 844