A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Chen, Kai [1 ,2 ]
Han, Guangjie [3 ]
Huang, Shuqiang [4 ]
Yuan, Weiwei [1 ,2 ]
Guizani, Mohsen [5 ]
Shu, Lei [6 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Peoples R China
[3] Dalian Univ Technol, Sch Software, Key Lab Ubiquitous Network & Serv Software Liaoni, Dalian 116024, Peoples R China
[4] Jinan Univ, Coll Sci & Engn, Dept Optoelect Engn, Guangzhou 510632, Peoples R China
[5] Qatar Univ, Coll Engn, Doha 2713, Qatar
[6] Nanjing Agr Univ, Coll Engn, Nanjing 210095, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Informatics; Training; Machine learning; Task analysis; Noise measurement; Reliability; High dimension; industrial informatics; noise filtering; OUTLIER DETECTION; CLASSIFICATION; SELECTION; QUALITY;
D O I
10.1109/TII.2020.3012658
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.
引用
收藏
页码:2181 / 2190
页数:10
相关论文
共 50 条
  • [41] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [42] Supervised Bayesian latent class models for high-dimensional data
    Desantis, Stacia M.
    Houseman, E. Andres
    Coull, Brent A.
    Nutt, Catherine L.
    Betensky, Rebecca A.
    STATISTICS IN MEDICINE, 2012, 31 (13) : 1342 - 1360
  • [43] Ensemble Clustering for Boundary Detection in High-Dimensional Data
    Anagnostou, Panagiotis
    Pavlidis, Nicos G.
    Tasoulis, Sotiris
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 324 - 333
  • [44] A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data
    Messaoud, Thouraya Aouled
    Smiti, Abir
    Louati, Aymen
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 322 - 331
  • [45] Projecting high-dimensional data for network intrusion detection
    Deng, HM
    Zeng, QA
    Agrawal, DP
    PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 373 - 376
  • [46] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xiaodan Xu
    Huawen Liu
    Li Li
    Minghai Yao
    International Journal of Computational Intelligence Systems, 2018, 11 : 652 - 662
  • [47] Anomaly detection in mixed high-dimensional molecular data
    Buck, Lena
    Schmidt, Tobias
    Feist, Maren
    Schwarzfischer, Philipp
    Kube, Dieter
    Oefner, Peter J.
    Zacharias, Helena U.
    Altenbuchinger, Michael
    Dettmer, Katja
    Gronwald, Wolfram
    Spang, Rainer
    BIOINFORMATICS, 2023, 39 (08)
  • [48] High-dimensional data
    Geubbelmans, Melvin
    Rousseau, Axel-Jan
    Valkenborg, Dirk
    Burzykowski, Tomasz
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (03) : 453 - 456
  • [49] High-dimensional data
    Amaratunga, Dhammika
    Cabrera, Javier
    JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2016, 44 (01): : 3 - 9
  • [50] High-dimensional clustering method for high performance data mining
    Chang, Jae-Woo
    Lee, Hyun-Jo
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 621 - +