Outlier detection for partially labeled categorical data based on conditional information entropy

被引:3
|
作者
Zhao, Zhengwei [1 ]
Wang, Rongrong [2 ]
Huang, Dan [3 ]
Li, Zhaowen [4 ]
机构
[1] Guangxi Minzu Univ, Sch Math & Phys, Nanning 530006, Guangxi, Peoples R China
[2] Guangxi Minzu Univ, Elect & Informat Engn, Nanning 530000, Guangxi, Peoples R China
[3] Yulin Normal Univ, Sch Comp Sci & Engn, Yulin 537000, Guangxi, Peoples R China
[4] Putian Univ, Key Lab Appl Math Fujian Prov Univ, Fujian Key Lab Financial Informat Proc, Putian 351100, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Partially labeled categorical data; Partially labeled categorical decision; information system; Outlier detection; Conditional information entropy; ALGORITHMS; CLUSTERS;
D O I
10.1016/j.ijar.2023.109086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Labeling a large amount of data is exceptionally costly and practically infeasible, and thus available data may have missing labels. In this article, we investigate outlier detection for partially labeled categorical data based on conditional information entropy. Firstly, the equivalence class in a partially labeled categorical decision information system (p-CDIS) is introduced, so that the missing labels can be predicted by use of conditional probability. Then, conditional information entropy in a p-CDIS is calculated, which provides a more comprehensive measure of uncertainty. Additionally, the relative information entropy and relative cardinality in a p-CDIS are proposed. Next, the degree of outlierness and the weight function are presented to find outlier factors. Finally, an outlier detection method in a p-CDIS based on conditional information entropy is proposed, and a corresponding conditional information entropy algorithm (CEOF) is designed. To evaluate the stability of the CEOF algorithm, experiments are performed on ten UCI Machine Learning Repository datasets. Compared with five other algorithms, the proposed method is shown to have good effectiveness and adaptability for categorical data.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Outlier detection using conditional information entropy and rough set theory
    Li, Zhaowen
    Wei, Shengxue
    Liu, Suping
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 1899 - 1918
  • [2] Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels
    Gao, Can
    Zhou, Jie
    Miao, Duoqian
    Yue, Xiaodong
    Wan, Jun
    INFORMATION SCIENCES, 2021, 580 : 111 - 128
  • [3] Outlier detection for multivariate categorical data
    Puig, Xavier
    Ginebra, Josep
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2018, 34 (07) : 1400 - 1412
  • [4] Local outlier detection based on information entropy weighting
    Wang, Lina
    Feng, Chao
    Ren, Yongjun
    Xia, Jinyue
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2019, 30 (04) : 207 - 217
  • [5] Unsupervised Feature Selection for Outlier Detection in Categorical Data using Mutual Information
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 253 - 258
  • [6] Information-Theoretic Outlier Detection for Large-Scale Categorical Data
    Wu, Shu
    Wang, Shengrui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) : 589 - 602
  • [7] Outlier Detection Method for Flash Flood Disaster Monitoring Data based on Information Entropy
    Chen, Yongzhi
    Xu, Ziao
    Niu, Chaoqun
    Journal of Physics: Conference Series, 2021, 2138 (01):
  • [8] Efficient outlier detection in numerical and categorical data
    Cabral, Eugenio F.
    Vinces, Braulio V. Sanchez
    Silva, Guilherme D. F.
    Sander, Jorg
    Cordeiro, Robson L. F.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 39 (03)
  • [9] WMEVF: AN OUTLIER DETECTION METHODS FOR CATEGORICAL DATA
    Rokhman, Nur
    Subanar
    Winarko, Edi
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 37 - 42
  • [10] An optimization model for Outlier detection in categorical data
    He, ZY
    Deng, SC
    Xu, XF
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 400 - 409