Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification

被引:87
|
作者
Bernstein, A
Provost, F
Hill, S
机构
[1] Univ Zurich, Dept Informat, CH-8057 Zurich, Switzerland
[2] NYU, Stern Sch Business, New York, NY 10012 USA
关键词
cost-sensitive learning; data mining; data mining process; intelligent assistants; knowledge discovery; knowledge; discovery process; machine learning; metalearning;
D O I
10.1109/TKDE.2005.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition.
引用
收藏
页码:503 / 518
页数:16
相关论文
共 50 条
  • [1] Semantic Data Mining, An Ontology-Based Approach
    Liu, Hui
    Xu, Fan
    TERMINOLOGY, 2018, 24 (02): : 289 - 294
  • [2] Ontology-based Intelligent Home Assistance System
    Nam, Ji-In
    Nagwani, Pawan
    Jang, Sae-Bom
    Shin, Young-Bin
    Jin, Ho
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [3] A Statistical Approach to Cost-Sensitive AdaBoost for Imbalanced Data Classification
    Bei, Honghan
    Wang, Yajie
    Ren, Zhaonuo
    Jiang, Shuo
    Li, Keran
    Wang, Wenyang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [4] Ontology-based data mining approach implemented for sport marketing
    Liao, Shu-Hsien
    Chen, Jen-Lung
    Hsu, Tze-Yuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (08) : 11045 - 11056
  • [5] Classification with Rejection Based on Cost-sensitive Classification
    Charoenphakdee, Nontawat
    Cui, Zhenghang
    Zhang, Yivan
    Sugiyama, Masashi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [6] Cost-sensitive classification with inadequate labeled data
    Wang, Tao
    Qin, Zhenxing
    Zhang, Shichao
    Zhang, Chengqi
    INFORMATION SYSTEMS, 2012, 37 (05) : 508 - 516
  • [7] Cost-sensitive boosting for classification of imbalanced data
    Sun, Yamnin
    Kamel, Mohamed S.
    Wong, Andrew K. C.
    Wang, Yang
    PATTERN RECOGNITION, 2007, 40 (12) : 3358 - 3378
  • [8] A fully distributed framework for cost-sensitive data mining
    Fan, W
    Wang, HX
    Yu, PS
    Stolfo, SJ
    22ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2002, : 445 - 446
  • [9] From Requirements to Data Analytics Process: An Ontology-Based Approach
    Bandara, Madhushi
    Behnaz, Ali
    Rabhi, Fethi A.
    Demirors, Onur
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2018 INTERNATIONAL WORKSHOPS, 2019, 342 : 543 - 552
  • [10] A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets
    Waiyamai, Kitsana
    Suwannarattaphoom, Phoonperm
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014, 2014, 8556 : 31 - 42