FASM and FAST-YB: Significant Pattern Mining with False Discovery Rate Control

被引:0
|
作者
Pellizzoni, Paolo [1 ]
Borgwardt, Karsten [1 ]
机构
[1] Max Planck Inst Biochem, Martinsried, Germany
来源
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年
关键词
Data mining; significant pattern mining; false; discovery rate;
D O I
10.1109/ICDM58522.2023.00159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In significant pattern mining, i.e. the task of discovering structures in data that exhibit a statistically significant association with class labels, it is often needed to have guarantees on the number of patterns that are erroneously deemed as statistically significant by the testing procedure. A desirable property, whose study in the context of pattern mining has been limited, is to control the expected proportion of false positives, often called the false discovery rate (FDR). In this paper, we develop two novel algorithms for mining statistically significant patterns under FDR control. The first one, FASM, builds upon the Benjamini-Yekutieli procedure and exploits the discrete nature of the test statistics to increase its computational efficiency and statistical power. The second one, FAST-YB, incorporates the Yekutieli-Benjamini permutation testing procedure to account for interdependencies among patterns, which allows for a further increase in statistical power. We performed an experimental evaluation on both synthetic and real -world datasets, and the comparisons with state-of-the-art algorithms show that the gains in statistical power are substantial.
引用
收藏
页码:1265 / 1270
页数:6
相关论文
共 50 条
  • [21] Online control of the false discovery rate with decaying memory
    Ramdas, Aaditya
    Yang, Fanny
    Wainwright, Martin J.
    Jordan, Michael, I
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [22] False discovery rate control with e-values
    Wang, Ruodu
    Ramdas, Aaditya
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (03) : 822 - 852
  • [23] Conformal link prediction for false discovery rate control
    Marandon, Ariane
    TEST, 2024, 33 (04) : 1062 - 1083
  • [24] False discovery rate control under Archimedean copula
    Bodnar, Taras
    Dickhaus, Thorsten
    ELECTRONIC JOURNAL OF STATISTICS, 2014, 8 : 2207 - 2241
  • [25] Importance of presenting the variability of the false discovery rate control
    Lin, Yi-Ting
    Lee, Wen-Chung
    BMC GENETICS, 2015, 16
  • [26] A Fuzzy Permutation Method for False Discovery Rate Control
    Yang, Ya-Hui
    Lin, Wan-Yu
    Lee, Wen-Chung
    SCIENTIFIC REPORTS, 2016, 6
  • [27] ADAPTIVE FALSE DISCOVERY RATE CONTROL FOR HETEROGENEOUS DATA
    Habiger, Joshua D.
    STATISTICA SINICA, 2017, 27 (04) : 1731 - 1756
  • [28] Wavelet thresholding with Bayesian false discovery rate control
    Tadesse, MG
    Ibrahim, JG
    Vannucci, M
    Gentleman, R
    BIOMETRICS, 2005, 61 (01) : 25 - 35
  • [29] Testing Jumps via False Discovery Rate Control
    Yen, Yu-Min
    PLOS ONE, 2013, 8 (04):
  • [30] Cellwise outlier detection with false discovery rate control
    Liu, Yanhong
    Ren, Haojie
    Guo, Xu
    Zhou, Qin
    Zou, Changliang
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2022, 50 (03): : 951 - 971