Mutual Information-based Feature Selection Approach to Reduce High Dimension of Big Data

被引:2
|
作者
Win, Thee Zin [1 ]
Kham, Nang Saing Moon [2 ]
机构
[1] Univ Comp Studies, Informat Sci Dept, Yangon, Myanmar
[2] Univ Comp Studies, Fac Informat, Sci Dept, Yangon, Myanmar
关键词
Feature Selection; High Dimensional Data; Redundant Features; Mutual Information;
D O I
10.1145/3278312.3278316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As increasing the massive amount of data demands effective and efficient mining strategies, practitioners and researchers are trying to develop scalable mining algorithms, machine learning algorithms and strategies to be successful data mining in turning mountains of data into nuggets. Data of high dimension significantly increases the memory storage requirements and computational costs for data analytics. Therefore, reducing dimension can mainly improve three data mining performance: speed of learning, predictive accuracy and simplicity and comprehensibility of mined result. Feature selection, data preprocessing technique, is effective and efficient in data mining, data analytics and machine learning problems particularly in high dimension reduction. Most feature selection algorithms can eliminate only irrelevant features but redundant features. Not only irrelevant features but also redundant features can degrade learning performance. Mutual information measured feature selection is proposed in this work to remove both irrelevant and redundant features.
引用
收藏
页码:3 / 7
页数:5
相关论文
共 50 条
  • [41] Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy
    Gu, Xiangyuan
    Guo, Jichang
    Xiao, Lijun
    Li, Chongyi
    APPLIED INTELLIGENCE, 2022, 52 (02) : 1436 - 1447
  • [42] Mutual Information-Based Feature Selection for Low-Cost BCIs Based on Motor Imagery
    Schiatti, L.
    Faes, L.
    Tessadori, J.
    Barresi, G.
    Mattos, L.
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 2772 - 2775
  • [43] Information-Based Optimal Subdata Selection for Big Data Linear Regression
    Wang, HaiYing
    Yang, Min
    Stufken, John
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (525) : 393 - 405
  • [44] Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy
    Xiangyuan Gu
    Jichang Guo
    Lijun Xiao
    Chongyi Li
    Applied Intelligence, 2022, 52 : 1436 - 1447
  • [45] A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering
    Shi, Shihai
    Liu, Qin
    INTERNATIONAL JOINT CONFERENCE ON APPLIED MATHEMATICS, STATISTICS AND PUBLIC ADMINISTRATION (AMSPA 2014), 2014, : 481 - 490
  • [46] Mutual information-based filter hybrid feature selection method for medical datasets using feature clustering
    Sadegh Asghari
    Hossein Nematzadeh
    Ebrahim Akbari
    Homayun Motameni
    Multimedia Tools and Applications, 2023, 82 : 42617 - 42639
  • [47] Mutual information-based filter hybrid feature selection method for medical datasets using feature clustering
    Asghari, Sadegh
    Nematzadeh, Hossein
    Akbari, Ebrahim
    Motameni, Homayun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42617 - 42639
  • [48] A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering
    Liu, Qin
    Shi, Shihai
    Zhu, Hongming
    Xiao, Jiakai
    2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 27 - 32
  • [49] A local information-based feature-selection algorithm for data regression
    Peng, Xinjun
    Xu, Dong
    PATTERN RECOGNITION, 2013, 46 (09) : 2519 - 2530
  • [50] Flexible-Fuzzy Mutual Information based Feature Selection on High Dimensional Data
    Manikandan, G.
    Susi, E.
    Abirami, S.
    2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 237 - 243