AI-ASSISTED DIGITALISATION OF HISTORICAL DOCUMENTS
被引:3
|
作者:
Ferro, S.
论文数: 0引用数: 0
h-index: 0
机构:
Ca Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Ist Italiano Tecnol, Ctr Cultural Heritage Technol, Via Torino 155, I-30172 Venice, ItalyCa Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Ferro, S.
[1
,2
]
Pelillo, M.
论文数: 0引用数: 0
h-index: 0
机构:
Ca Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Ist Italiano Tecnol, Ctr Cultural Heritage Technol, Via Torino 155, I-30172 Venice, ItalyCa Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Pelillo, M.
[1
,2
]
Traviglia, A.
论文数: 0引用数: 0
h-index: 0
机构:
Ca Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Ist Italiano Tecnol, Ctr Cultural Heritage Technol, Via Torino 155, I-30172 Venice, ItalyCa Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
Traviglia, A.
[1
,2
]
机构:
[1] Ca Foscari Univ Venice, DAIS, Via Torino 155, I-30172 Venice, Italy
[2] Ist Italiano Tecnol, Ctr Cultural Heritage Technol, Via Torino 155, I-30172 Venice, Italy
Historical Documents;
Handwriting;
Digitisation;
Digitalisation;
Cultural Heritage;
Preservation;
D O I:
10.5194/isprs-archives-XLVIII-M-2-2023-557-2023
中图分类号:
K85 [文物考古];
学科分类号:
0601 ;
摘要:
Preserving historical archival heritage involves not only physical measures to safeguard these valuable texts but also providing for their digital preservation. However, merely digitising manuscripts and codexes is not enough. A further step is needed: the digitalisation of their content, i.e. the verbatim transcription of scanned texts. This process enables the accurate preservation of their textual content, making it easier to search for information and conduct further analyses. With the help of artificial intelligence, particularly Deep Neural Networks (DNNs), automatic handwriting recognition can be performed. In this study, we employed a Convolutional Recurrent Neural Network (CRNN), an established type of DNN, to determine the minimum amount of labelled data required to automatically transcribe five different historical datasets that vary in language and time period. The results show that a Character Error Rate (CER) lower than 10% can be achieved with just a few hundred labelled text lines in almost all cases.