Ranking Mutual Information Dependencies in a Summary-based Approximate Analytics Framework

被引:2
|
作者
Slezak, Dominik [1 ]
Borkowski, Janusz [2 ]
Chadzynska-Krasowska, Agnieszka [3 ]
机构
[1] Univ Warsaw, Inst Informat, Ul Banacha 2, PL-02097 Warsaw, Poland
[2] Secur On Demand, 12121 Scripps Summit Dr 320, San Diego, CA 92131 USA
[3] Polish Japanese Acad Informat Technol, Ul Koszykowa 86, PL-02008 Warsaw, Poland
关键词
Approximate Data Processing; Granulated Data Summaries; Approximate Mutual Information; ENGINE;
D O I
10.1109/HPCS.2018.00137
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We continue our research on utilizing histogram-based data summaries in approximate derivation of mutual information scores in large relational data sets. Our methodology of creating, storing and using summaries has been designed for the purpose of developing an approximate database engine that is currently deployed commercially in the area of cybersecurity data analytics. However, a similar idea of approximate data processing operations can be considered also in other fields, including machine learning whereby heuristic calculations are a component of many methods. In this paper, we focus on investigation of one possible source of inaccuracy of our previously proposed approach to approximating mutual information - that is, neglecting a kind of column domain drift during distributed summary-based computations. We illustrate it using an artificially created benchmark data set and we discuss how to cope this particular challenge in the future.
引用
收藏
页码:852 / 859
页数:8
相关论文
共 50 条
  • [41] Correction to: Extensive framework based on novel convolutional and variational autoencoder based on maximization of mutual information for anomaly detection
    Qien Yu
    Muthu Subash Kavitha
    Takio Kurita
    Neural Computing and Applications, 2022, 34 : 821 - 821
  • [42] The mutual information based minimum spanning tree to detect and evaluate dependencies between aero-engine gas path system variables
    Dong, Keqiang
    Long, Linan
    Zhang, Hong
    Gao, You
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 506 : 248 - 253
  • [43] A Preference-oriented Diversity Model Based on Mutual-information in Re-ranking for E-commerce Search
    Wang, Huimu
    Li, Mingming
    Miao, Dadong
    Wang, Songlin
    Tang, Guoyu
    Liu, Lin
    Xu, Sulong
    Hu, Jinghe
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2895 - 2899
  • [44] Detection of Epileptic Seizure from Electroencephalogram Signals Based on Feature Ranking and Best Feature Subset Using Mutual Information Estimation
    Sharmila, A.
    Geethanjali, P.
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (08) : 1850 - 1864
  • [45] Unraveling the evolutionary patterns of construction accidents: a risk assessment framework based on average mutual information theory
    Jian Liu
    Hanqiang Tang
    Rui Feng
    Scientific Reports, 15 (1)
  • [46] A DATA ANALYTICS FRAMEWORK FOR SMART ASTHMA MANAGEMENT BASED ON REMOTE HEALTH INFORMATION SYSTEMS WITH BLUETOOTHENABLED PERSONAL INHALERS
    Son, Junbo
    Brennan, Patricia Flatley
    Zhou, Shiyu
    MIS QUARTERLY, 2020, 44 (01) : 285 - 303
  • [47] Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications
    Tiwari, Anoop Kumar
    Saini, Rajat
    Nath, Abhigyan
    Singh, Phool
    Shah, Mohd Asif
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [48] A Multiple VAEs-based Information Fusion Framework With Mutual-KL Loss for Intelligent Fault Diagnosis and Toward OoD Detection
    Wang, Cunjun
    Xu, Zili
    Wang, Jun
    Yan, Song
    2022 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2022, : 188 - 197
  • [49] Spatio-temporal information analytics based performance-driven industrial process monitoring framework with cloud-edge-device collaboration
    Zhang, Chi
    Dong, Jie
    Peng, Kaixiang
    Zhang, Hanwen
    JOURNAL OF MANUFACTURING PROCESSES, 2024, 110 : 224 - 237
  • [50] A Data-Driven and Data-Based Framework for Online Voltage Stability Assessment Using Partial Mutual Information and Iterated Random Forest
    Liu, Songkai
    Shi, Ruoyuan
    Huang, Yuehua
    Li, Xin
    Li, Zhenhua
    Wang, Lingyun
    Mao, Dan
    Liu, Lihuang
    Liao, Siyang
    Zhang, Menglin
    Yan, Guanghui
    Liu, Lian
    ENERGIES, 2021, 14 (03)