Mining combined causes in large data sets

被引:13
|
作者
Ma, Saisai [1 ]
Li, Jiuyong [1 ]
Liu, Lin [1 ]
Thuc Duy Le [1 ]
机构
[1] Univ S Australia, Sch Informat Technol & Math Sci, Mawson Lakes, SA 5095, Australia
基金
澳大利亚研究理事会;
关键词
Causal discovery; Combined causes; Local causal discovery; HITON-PC; Multi-level HITON-PC; LEARNING BAYESIAN NETWORKS; ASSOCIATION; DISCOVERY; CAUSATION; MODELS;
D O I
10.1016/j.knosys.2015.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:104 / 111
页数:8
相关论文
共 50 条
  • [21] Incremental meta-mining from large temporal data sets
    Abraham, T
    Roddick, JF
    ADVANCES IN DATABASE TECHNOLOGIES, 1999, 1552 : 41 - 54
  • [22] A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID
    Roehm, Matthias
    Grabert, Matthias
    Schweiggert, Franz
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION (CLOUD COMPUTING 2010), 2010, : 14 - 19
  • [23] PixelMaps: A new visual data mining approach for analyzing large spatial data sets
    Keim, DA
    Panse, C
    Sips, M
    North, SC
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 565 - 568
  • [24] Visual data mining of large data sets using Vitamin-S system
    Antoch, J
    NEURAL NETWORK WORLD, 2005, 15 (04) : 283 - 293
  • [25] Parallel Distributed Genetic Rule Selection for Data Mining from Large Data Sets
    Nojima, Yusuke
    Mihara, Shingo
    Ishibuchi, Hisao
    SIMULATION AND MODELING RELATED TO COMPUTATIONAL SCIENCE AND ROBOTICS TECHNOLOGY, 2012, 37 : 140 - 154
  • [26] Data Mining on Imbalanced Data Sets
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    Huang, Bo
    2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, : 1020 - 1024
  • [27] Data mining and metrics on data sets
    Biebler, Karl-Ernst
    Wodny, Michael
    Jaeger, Bernd
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 638 - +
  • [28] Mining transformed data sets
    Burns, A
    Kusiak, A
    Letsche, T
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 148 - 154
  • [29] P-AutoClass: Scalable parallel clustering for mining large data sets
    Pizzuti, C
    Talia, D
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) : 629 - 641
  • [30] Visualization of large data sets using MDS combined with LVQ.
    Naud, A
    Duch, W
    NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 632 - 637