The Unreasonable Effectiveness of Data

被引:923
作者
Halevy, Alon
Norvig, Peter
Pereira, Fernando
机构
[1] Google, United States
关键词
D O I
10.1109/MIS.2009.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural language processing problems are solved by the use of unreasonable effectiveness of data. The biggest successes in natural-language-related machine learning is statistical speech recognition and statistical machine translation. The first lesson of Web-scale learning is to use available large-scale data rather than hoping for annotated data that is not available. The statistical language models used in speech recognition and machine translation consist of a huge database of probabilities of short sequences of consecutive words. Natural language processing require choosing a representation language, encoding a model in that language, and performing inference on the model. Semantic interpretation deals with imprecise, ambiguous natural languages, and service interoperability deals with making data precise enough so that the programs operating on the data functions effectively.
引用
收藏
页码:8 / 12
页数:5
相关论文
共 15 条
[1]  
*AIS SIG SEMIS, 2004, AIS SIG SEMIS B, V1
[2]  
[Anonymous], 2006, Web 1T 5-gram Version 1
[3]  
BERNERSLEE T, 2001, SCI AM 0517
[4]  
CAFARELLA MJ, 2008, P VER LARG DAT BAS E, P538
[5]  
FRIEDLAND P, 2004, P INT C PRINC KNOWL, P507
[6]  
Getoor L., 2007, Introduction to Statistical Relational Learning
[7]   Scene Completion Using Millions of Photographs [J].
Hays, James ;
Efros, Alexei A. .
COMMUNICATIONS OF THE ACM, 2008, 51 (10) :87-94
[8]  
Kucera Henry, 1967, Computational analysis of presentday American English
[9]  
Pasca Marius., 2007, P 16 INT WORLD WIDE, P101, DOI DOI 10.1145/1242572.1242587
[10]  
Quirk R., 1985, A comprehensive grammar of the English language