FASM and FAST-YB: Significant Pattern Mining with False Discovery Rate Control

被引:0
|
作者
Pellizzoni, Paolo [1 ]
Borgwardt, Karsten [1 ]
机构
[1] Max Planck Inst Biochem, Martinsried, Germany
关键词
Data mining; significant pattern mining; false; discovery rate;
D O I
10.1109/ICDM58522.2023.00159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In significant pattern mining, i.e. the task of discovering structures in data that exhibit a statistically significant association with class labels, it is often needed to have guarantees on the number of patterns that are erroneously deemed as statistically significant by the testing procedure. A desirable property, whose study in the context of pattern mining has been limited, is to control the expected proportion of false positives, often called the false discovery rate (FDR). In this paper, we develop two novel algorithms for mining statistically significant patterns under FDR control. The first one, FASM, builds upon the Benjamini-Yekutieli procedure and exploits the discrete nature of the test statistics to increase its computational efficiency and statistical power. The second one, FAST-YB, incorporates the Yekutieli-Benjamini permutation testing procedure to account for interdependencies among patterns, which allows for a further increase in statistical power. We performed an experimental evaluation on both synthetic and real -world datasets, and the comparisons with state-of-the-art algorithms show that the gains in statistical power are substantial.
引用
收藏
页码:1265 / 1270
页数:6
相关论文
共 50 条
  • [1] The false discovery rate for statistical pattern recognition
    Scott, Clayton
    Bellala, Gowtham
    Willett, Rebecca
    ELECTRONIC JOURNAL OF STATISTICS, 2009, 3 : 651 - 677
  • [2] ONLINE RULES FOR CONTROL OF FALSE DISCOVERY RATE AND FALSE DISCOVERY EXCEEDANCE
    Javanmard, Adel
    Montanari, Andrea
    ANNALS OF STATISTICS, 2018, 46 (02): : 526 - 554
  • [3] False Discovery Rate Control With Groups
    Hu, James X.
    Zhao, Hongyu
    Zhou, Harrison H.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (491) : 1215 - 1227
  • [4] False Discovery Rate Control for Fast Screening of Large-Scale Genomics Biobanks
    Machkour, Jasin
    Muma, Michael
    Palomar, Daniel P.
    2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 666 - 670
  • [5] Distributed False Discovery Rate Control with Quantization
    Xiang, Yu
    2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 246 - 249
  • [6] Optimal weighting for false discovery rate control
    Roquain, Etienne
    van de Wiel, Mark A.
    ELECTRONIC JOURNAL OF STATISTICS, 2009, 3 : 678 - 711
  • [7] Contextual Online False Discovery Rate Control
    Chen, Shiyun
    Kasiviswanathan, Shiva Prasad
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 952 - 960
  • [8] The application conditions of false discovery rate control
    Zhang, Hongbin
    Le, Xin
    Xiang, Tingxiu
    GENES & DISEASES, 2023, 10 (04) : 1145 - 1146
  • [9] Symmetric directional false discovery rate control
    Holte, Sarah E.
    Lee, Eva K.
    Mei, Yajun
    STATISTICAL METHODOLOGY, 2016, 33 : 71 - 82
  • [10] Copulas, uncertainty, and false discovery rate control
    Cerquet, Roy
    Lupi, Claudio
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 100 : 105 - 114