A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Chen, Kai [1 ,2 ]
Han, Guangjie [3 ]
Huang, Shuqiang [4 ]
Yuan, Weiwei [1 ,2 ]
Guizani, Mohsen [5 ]
Shu, Lei [6 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Peoples R China
[3] Dalian Univ Technol, Sch Software, Key Lab Ubiquitous Network & Serv Software Liaoni, Dalian 116024, Peoples R China
[4] Jinan Univ, Coll Sci & Engn, Dept Optoelect Engn, Guangzhou 510632, Peoples R China
[5] Qatar Univ, Coll Engn, Doha 2713, Qatar
[6] Nanjing Agr Univ, Coll Engn, Nanjing 210095, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Informatics; Training; Machine learning; Task analysis; Noise measurement; Reliability; High dimension; industrial informatics; noise filtering; OUTLIER DETECTION; CLASSIFICATION; SELECTION; QUALITY;
D O I
10.1109/TII.2020.3012658
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.
引用
收藏
页码:2181 / 2190
页数:10
相关论文
共 50 条
  • [1] A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data
    Chen, Kai
    Guan, Donghai
    Yuan, Weiwei
    Li, Bohan
    Khattak, Asad Masood
    Alfandi, Omar
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 55 - 65
  • [2] A Novel Indexing Method for Improving Timeliness of High-Dimensional Data
    Lu, Jian
    Pham, Huong
    Zhu, Hongwei
    Chen, Cindy
    AMCIS 2014 PROCEEDINGS, 2014,
  • [3] A novel ensemble method for high-dimensional genomic data classification
    Espichan, Alexandra
    Villanueva, Edwin
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2229 - 2236
  • [4] Anomaly Detection in High-Dimensional Data
    Talagala, Priyanga Dilini
    Hyndman, Rob J.
    Smith-Miles, Kate
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (02) : 360 - 374
  • [5] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [6] Class visualization of high-dimensional data with applications
    Dhillon, IS
    Modha, DS
    Spangler, WS
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 41 (01) : 59 - 90
  • [7] Class prediction for high-dimensional class-imbalanced data
    Rok Blagus
    Lara Lusa
    BMC Bioinformatics, 11
  • [8] Class prediction for high-dimensional class-imbalanced data
    Blagus, Rok
    Lusa, Lara
    BMC BIOINFORMATICS, 2010, 11 : 523
  • [9] A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data
    Ding, Qi
    Kolaczyk, Eric D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) : 7419 - 7433
  • [10] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718