Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity

被引:0
|
作者
Ferdosi, Mohsen [1 ]
Davoodi, Arash Gholami [1 ]
Mohimani, Hosein [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家卫生研究院;
关键词
BAYESIAN NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding associations between pairs of variables in large datasets is crucial for various disciplines. The brute force method for solving this problem requires computing the mutual information between ((2)(N)) pairs. In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity. This problem is analogous to the nearest neighbor search, where the goal is to find pairs among N variables that are similar to each other. To solve this problem, we develop a new algorithm for finding associations based on constructing a decision tree that assigns a hash to each variable, in a way that for pairs with higher mutual information, the chance of having the same hash is higher. For any 1 <= lambda <= 2, we prove that in the case of binary data, we can reduce the number of necessary mutual information computations for finding all pairs satisfying I(X, Y) > 2 - lambda from O(N-2) to O(N-lambda), where I(X, Y) is the empirical mutual information between variables X and Y. Finally, we confirmed our theory by experiments on simulated and real data.
引用
收藏
页码:4399 / 4408
页数:10
相关论文
共 50 条
  • [1] Measuring distances between variables by mutual information
    Steuer, R
    Daub, CO
    Selbig, J
    Kurths, J
    INNOVATIONS IN CLASSIFICATION, DATA SCIENCE, AND INFORMATION SYSTEMS, 2005, : 81 - 90
  • [2] MEASURING COMPLEXITY IN TERMS OF MUTUAL INFORMATION
    FRASER, AM
    MEASURES OF COMPLEXITY AND CHAOS, 1989, 208 : 117 - 119
  • [3] On the Mutual Information between Random Variables in Networks
    Xu, Xiaoli
    Thakor, Satyajit
    Guan, Yong Liang
    2013 IEEE INFORMATION THEORY WORKSHOP (ITW), 2013,
  • [4] The mutual information: Detecting and evaluating dependencies between variables
    Steuer, R
    Kurths, J
    Daub, CO
    Weise, J
    Selbig, J
    BIOINFORMATICS, 2002, 18 : S231 - S240
  • [5] On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity
    Williams, Ryan
    SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 1207 - 1215
  • [6] Measuring Independence between Statistical Randomness Tests by Mutual Information
    Augusto Karell-Albo, Jorge
    Miguel Legon-Perez, Carlos
    Jose Madarro-Capo, Evaristo
    Rojas, Omar
    Sosa-Gomez, Guillermo
    ENTROPY, 2020, 22 (07)
  • [7] Fuzzy modeling of fuel cell based on mutual information between variables
    Kishor, Nand
    Mohanty, Soumya R.
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2010, 35 (08) : 3620 - 3631
  • [8] Heterogeneous Link Prediction via Mutual Information Maximization Between Node Pairs
    Lu, Yifan
    Liu, Zehao
    Gao, Mengzhou
    Jiao, Pengfei
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 460 - 470
  • [9] Shape complexity based on mutual information
    Rigau, J
    Feixas, M
    Sbert, M
    INTERNATIONAL CONFERENCE ON SHAPE MODELING AND APPLICATIONS, PROCEEDINGS, 2005, : 355 - 360
  • [10] An almost 2-approximation for all-pairs of shortest paths in subquadratic time
    Akav, Maor
    Roditty, Liam
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 1 - 11