Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity

被引：0

作者：

Ferdosi, Mohsen ^{[1
]}

Davoodi, Arash Gholami ^{[1
]}

Mohimani, Hosein ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷

基金：

美国国家卫生研究院;

关键词：

BAYESIAN NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Finding associations between pairs of variables in large datasets is crucial for various disciplines. The brute force method for solving this problem requires computing the mutual information between ((2)(N)) pairs. In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity. This problem is analogous to the nearest neighbor search, where the goal is to find pairs among N variables that are similar to each other. To solve this problem, we develop a new algorithm for finding associations based on constructing a decision tree that assigns a hash to each variable, in a way that for pairs with higher mutual information, the chance of having the same hash is higher. For any 1 <= lambda <= 2, we prove that in the case of binary data, we can reduce the number of necessary mutual information computations for finding all pairs satisfying I(X, Y) > 2 - lambda from O(N-2) to O(N-lambda), where I(X, Y) is the empirical mutual information between variables X and Y. Finally, we confirmed our theory by experiments on simulated and real data.

引用

页码：4399 / 4408

页数：10

共 50 条

[41] Evaluation of Statistical Relationship of Random Variables via Mutual Information
V. V. Tsurko
A. I. Mikhalskii
Automation and Remote Control, 2022, 83 : 734 - 742
[42] Mutual information for the selection of relevant variables in spectrometric nonlinear modelling
Rossi, F
Lendasse, A
François, D
Wertz, V
Verleysen, M
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 80 (02) : 215 - 226
[43] A new estimate of mutual information based measure of dependence between two variables: properties and fast implementation
Jain, Namita
Murthy, C. A.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (05) : 857 - 875
[44] Agglomerative hierarchical clustering of continuous variables based on mutual information
Kojadinovic, I
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 46 (02) : 269 - 294
[45] On mutual information estimation for mixed-pair random variables
Beknazaryan, Aleksandr
Dang, Xin
Sang, Hailin
STATISTICS & PROBABILITY LETTERS, 2019, 148 : 9 - 16
[46] Evaluation of Statistical Relationship of Random Variables via Mutual Information
Tsurko, V. V.
Mikhalskii, A., I
AUTOMATION AND REMOTE CONTROL, 2022, 83 (05) : 734 - 742
[47] Diagnosing and Measuring Incompatibilities between Pairs of Services
Ait-Bachir, Ali
Fauvet, Marie-Christine
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 229 - 243
[48] On inequalities between mutual information and variation
Prelov, V. V.
PROBLEMS OF INFORMATION TRANSMISSION, 2007, 43 (01) : 12 - 22
[49] On relationship between mutual information and variation
Prelov, Viacheslav
2007 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-7, 2007, : 51 - 55
[50] RELATIONSHIP BETWEEN MUTUAL INFORMATION AND CLASSIFICATION
RITTER, GL
LOWRY, SR
WOODRUFF, HB
ISENHOUR, TL
ANALYTICAL CHEMISTRY, 1976, 48 (07) : 1027 - 1031

← 1 2 3 4 5 →