Outlier detection for partially labeled categorical data based on conditional information entropy

被引：3

作者：

Zhao, Zhengwei ^{[1
]}

Wang, Rongrong ^{[2
]}

Huang, Dan ^{[3
]}

Li, Zhaowen ^{[4
]}

机构：

[1] Guangxi Minzu Univ, Sch Math & Phys, Nanning 530006, Guangxi, Peoples R China

[2] Guangxi Minzu Univ, Elect & Informat Engn, Nanning 530000, Guangxi, Peoples R China

[3] Yulin Normal Univ, Sch Comp Sci & Engn, Yulin 537000, Guangxi, Peoples R China

[4] Putian Univ, Key Lab Appl Math Fujian Prov Univ, Fujian Key Lab Financial Informat Proc, Putian 351100, Fujian, Peoples R China

来源：

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING | 2024年 / 164卷

基金：

中国国家自然科学基金;

关键词：

Partially labeled categorical data; Partially labeled categorical decision; information system; Outlier detection; Conditional information entropy; ALGORITHMS; CLUSTERS;

D O I：

10.1016/j.ijar.2023.109086

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Labeling a large amount of data is exceptionally costly and practically infeasible, and thus available data may have missing labels. In this article, we investigate outlier detection for partially labeled categorical data based on conditional information entropy. Firstly, the equivalence class in a partially labeled categorical decision information system (p-CDIS) is introduced, so that the missing labels can be predicted by use of conditional probability. Then, conditional information entropy in a p-CDIS is calculated, which provides a more comprehensive measure of uncertainty. Additionally, the relative information entropy and relative cardinality in a p-CDIS are proposed. Next, the degree of outlierness and the weight function are presented to find outlier factors. Finally, an outlier detection method in a p-CDIS based on conditional information entropy is proposed, and a corresponding conditional information entropy algorithm (CEOF) is designed. To evaluate the stability of the CEOF algorithm, experiments are performed on ten UCI Machine Learning Repository datasets. Compared with five other algorithms, the proposed method is shown to have good effectiveness and adaptability for categorical data.

引用

页数：25

共 50 条

[41] HOT: Hypergraph-based outlier test for categorical data
Wei, L
Qian, WN
Zhou, AY
Jin, W
Yu, JX
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 399 - 410
[42] Outlier detection based on multisource information fusion in incomplete mixed data
Li, Ran
Chen, Hongchang
Liu, Shuxin
Wang, Kai
Liu, Shuo
Su, Zhe
APPLIED SOFT COMPUTING, 2024, 165
[43] FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets
Du, Hongwei
Ye, Qiang
Sun, Zhipeng
Liu, Chuang
Xu, Wen
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (01): : 13 - 24
[44] Feature selection considering interaction, redundancy and complementarity for outlier detection in categorical data
Wang, Lianxi
Ke, Yubing
KNOWLEDGE-BASED SYSTEMS, 2023, 275
[45] Entropy-based outlier detection using spark
Feng, Guilan
Li, Zhengnan
Zhou, Wengang
Dong, Shi
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 409 - 419
[46] Entropy-based outlier detection using spark
Guilan Feng
Zhengnan Li
Wengang Zhou
Shi Dong
Cluster Computing, 2020, 23 : 409 - 419
[47] ROBOUT: a conditional outlier detection methodology for high-dimensional data
Farne, Matteo
Vouldis, Angelos
STATISTICAL PAPERS, 2024, 65 (04) : 2489 - 2525
[48] Outlier Detection Based on the Data Structure
Guo, Feng
Shi, Canghong
Li, Xiaojie
He, Jia
Wu, Xi
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[49] DDoS Detection and Prevention Based on Joint Entropy and Conditional Entropy
Gu Yonghao
Wu Weiming
ADVANCED MATERIALS AND COMPUTER SCIENCE, PTS 1-3, 2011, 474-476 : 2129 - 2133
[50] Information granularity-based incremental feature selection for partially labeled hybrid data
Shu, Wenhao
Yan, Zhenchao
Chen, Ting
Yu, Jianhui
Qian, Wenbin
INTELLIGENT DATA ANALYSIS, 2022, 26 (01) : 33 - 56

← 1 2 3 4 5 →