History of the Tesseract OCR Engine: What Worked and What Didn't How to Build a World-Class OCR Engine in Less Than 20 Years

被引:19
|
作者
Smith, Ray [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
关键词
OCR; Machine learning; Structural pattern recognition; Multi-language OCR; RECOGNITION; MODELS;
D O I
10.1117/12.2010051
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
This paper describes the development history of the Tesseract OCR engine, and compares the methods to general changes in the field over a similar time period. Emphasis is placed on the lessons learned with the goal of providing a primer for those interested in OCR research.
引用
收藏
页数:12
相关论文
empty
未找到相关数据