Learning from Batched Data: Model Combination Versus Data Combination

被引:0
|
作者
Kai Ming Ting
Boon Toh Low
Ian H. Witten
机构
[1] Deakin University,School of Computing and Mathematics
[2] Chinese University of Hong Kong,Department of Systems Engineering and Engineering Management
[3] University of Waikato,Department of Computer Science
关键词
Model combination; data combination; empirical evaluation; learning curve; near-asymptotic performance;
D O I
10.1007/BF03325092
中图分类号
学科分类号
摘要
Combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e. the data combination approach). This paper empirically examines the base-line behavior of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used. In the beginning of the curve, model combination has higher bias and variance than data combination and thus a higher error rate. As training data increases, model combination has either a lower error rate than or a comparable performance to data combination because the former achieves larger variance reduction. We also show that this result is not sensitive to the methods of model combination employed. Another interesting result is that we empirically show that the near-asymptotic performance of a single model in some classification tasks can be significantly improved by combining multiple models (derived from the same algorithm) in the multiple-data-batches scenario.
引用
收藏
页码:83 / 106
页数:23
相关论文
共 50 条
  • [1] Learning from Batched Data: Model Combination Versus Data Combination
    Ting, Kai Ming
    Low, Boon Toh
    Witten, Ian H.
    Knowledge and Information Systems, 1999, 1 (01): : 83 - 106
  • [2] Learning from Combination of Data Chunks for Multi-class Imbalanced Data
    Liu, Xu-Ying
    Li, Qian-Qian
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1680 - 1687
  • [3] The combination of data
    Mather, K
    ANNALS OF EUGENICS, 1935, 6 : 399 - 410
  • [4] Combination Model of Heterogeneous Data for Security Measurement
    Dong, Xiuze
    Guo, Yunchuan
    Li, Fenghua
    Dong, Liju
    Khan, Arshad
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2019, 25 (03) : 270 - 281
  • [5] The big data mining forecasting model based on combination of improved manifold learning and deep learning
    Chen, Xiurong
    Tian, Yixiang
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (02) : 119 - 131
  • [6] A SIMPLE LAYER MODEL OF GEOPOTENTIAL FROM A COMBINATION OF SATELLITE AND GRAVITY DATA
    KOCH, KR
    MORRISON, F
    JOURNAL OF GEOPHYSICAL RESEARCH, 1970, 75 (08): : 1483 - +
  • [7] A SIMPLE LAYER MODEL OF GEOPOTENTIAL FROM A COMBINATION OF SATELLITE AND GRAVITY DATA
    KOCH, KR
    MORRISON, F
    TRANSACTIONS-AMERICAN GEOPHYSICAL UNION, 1969, 50 (11): : 604 - &
  • [8] Pairwise Combination of Classifiers for Ensemble Learning on Data Streams
    Gomes, Heitor Murilo
    Barddal, Jean Paul
    Enembreck, Fabricio
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 941 - 946
  • [9] On the combination of observational data
    Levy, H
    Gascoigne, JC
    PROCEEDINGS OF THE PHYSICAL SOCIETY, 1936, 48 : 79 - 84
  • [10] Cancer detection from textual data using a combination of machine learning approach
    Salmanpoursohi, Bita
    Daneshvar, Amir
    Salmanpoursohi, Shakiba
    Chobar, Adel Pourghader
    Salahi, Fariba
    INTERDISCIPLINARY JOURNAL OF MANAGEMENT STUDIES, 2024, 17 (03): : 1001 - 1014