Ranking Mutual Information Dependencies in a Summary-based Approximate Analytics Framework

被引:2
|
作者
Slezak, Dominik [1 ]
Borkowski, Janusz [2 ]
Chadzynska-Krasowska, Agnieszka [3 ]
机构
[1] Univ Warsaw, Inst Informat, Ul Banacha 2, PL-02097 Warsaw, Poland
[2] Secur On Demand, 12121 Scripps Summit Dr 320, San Diego, CA 92131 USA
[3] Polish Japanese Acad Informat Technol, Ul Koszykowa 86, PL-02008 Warsaw, Poland
关键词
Approximate Data Processing; Granulated Data Summaries; Approximate Mutual Information; ENGINE;
D O I
10.1109/HPCS.2018.00137
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We continue our research on utilizing histogram-based data summaries in approximate derivation of mutual information scores in large relational data sets. Our methodology of creating, storing and using summaries has been designed for the purpose of developing an approximate database engine that is currently deployed commercially in the area of cybersecurity data analytics. However, a similar idea of approximate data processing operations can be considered also in other fields, including machine learning whereby heuristic calculations are a component of many methods. In this paper, we focus on investigation of one possible source of inaccuracy of our previously proposed approach to approximating mutual information - that is, neglecting a kind of column domain drift during distributed summary-based computations. We illustrate it using an artificially created benchmark data set and we discuss how to cope this particular challenge in the future.
引用
收藏
页码:852 / 859
页数:8
相关论文
共 50 条
  • [1] Scalable Cyber-Security Analytics with a New Summary-based Approximate Query Engine
    Slezak, Dominik
    Chadzynska-Krasowska, Agnieszka
    Holland, Joel
    Synak, Piotr
    Glick, Rick
    Perkowski, Marcin
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1840 - 1849
  • [2] Summary-based model of information retrieval in language model framework
    Li, Weijiang
    Zhao, Tiejun
    Journal of Computational Information Systems, 2009, 5 (03): : 1201 - 1207
  • [3] Discovering Approximate Functional Dependencies using Smoothed Mutual Information
    Pennerath, Frederic
    Mandros, Panagiotis
    Vreeken, Jilles
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1254 - 1264
  • [4] Distributed aggregation-based attributed graph summarization for summary-based approximate attributed graph queries
    Yang, Shang
    Yang, Zhipeng
    Chen, Xiaona
    Zhao, Jingpeng
    Ma, Yinglong
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176
  • [5] EMD, RANKING MUTUAL INFORMATION AND PCA BASED CONDITION MONITORING
    Zhao, Xiaomin
    Zuo, Ming J.
    Patel, Tejas
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, DETC 2010, VOL 5, 2010, : 777 - +
  • [6] Approximate reduced model based on mutual information for security assessment
    Li, Wei
    Fan, Ming-Yu
    Kongzhi yu Juece/Control and Decision, 2010, 25 (09): : 1426 - 1430
  • [7] Efficient Approximate Solutions to Mutual Information Based Global Feature Selection
    Venkateswara, Hemanth
    Lade, Prasanth
    Lin, Binbin
    Ye, Jieping
    Panchanathan, Sethuraman
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 1009 - 1014
  • [8] A Mutual Information-based Framework for the Analysis of Information Retrieval Systems
    Golbus, Peter B.
    Aslam, Javed A.
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 683 - 692
  • [9] Can high-order dependencies improve mutual information based feature selection?
    Nguyen Xuan Vinh
    Zhou, Shuo
    Chan, Jeffrey
    Bailey, James
    PATTERN RECOGNITION, 2016, 53 : 46 - 58
  • [10] Selective AnDE based on attributes ranking by Maximin Conditional Mutual information (MMCMI)
    Chen, Shenglei
    Ma, Xin
    Liu, Linyuan
    Wang, Limin
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2023, 35 (01) : 151 - 170