A Bayesian information theoretic model of learning to learn via multiple task sampling

被引:298
作者
Baxter, J
机构
[1] UNIV LONDON LONDON SCH ECON & POLIT SCI,DEPT MATH,LONDON WC2A 2AE,ENGLAND
[2] UNIV LONDON ROYAL HOLLOWAY & BEDFORD NEW COLL,DEPT COMP SCI,EGHAM TW20 0EX,SURREY,ENGLAND
基金
英国工程与自然科学研究理事会;
关键词
hierarchical Bayesian inference; bias learning; feature learning; neural networks; information theory;
D O I
10.1023/A:1007327622663
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the concept of an objective prior distribution. It is argued that for many common machine learning problems, although in general we do not know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by learning sufficiently many tasks from the environment. In addition, bounds art given on the amount of information required to team a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, but the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous. The theory is applied to the problem of learning a common feature set or equivalently a low-dimensional-representation (LDR) for an environment of related tasks.
引用
收藏
页码:7 / 39
页数:33
相关论文
共 24 条
[1]  
Abu-Mostafa Y. S., 1990, Journal of Complexity, V6, P192, DOI 10.1016/0885-064X(90)90006-Y
[2]  
[Anonymous], NEURAL COMPUT
[3]  
[Anonymous], NEUROCOMPUTING ALGOR
[4]  
ANTHONY M, 1995, P 2 EUR C COMP LEARN
[5]  
BARRON A, 1994, J STAT PLANNING INFE, V41, P37
[6]  
BARTLETT P, 1994, P 7 ACM C COMP LEARN
[7]  
Baxter J, 1996, ADV NEUR IN, V8, P169
[8]  
Baxter J., 1995, Proceedings of the Eighth Annual Conference on Computational Learning Theory, P311, DOI 10.1145/225298.225336
[9]  
BAXTER J, 1995, LSEMPS97 CTR DISCR A
[10]  
BAXTER J, 1996, P 9 ACM C COMP LEARN