Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2024 | 31 | 2 | 447-467

Article title

Digitální studovna Ministerstva obrany ČR: využití technologií na pokročilou indexaci obsahu historických dokumentů

Content

Title variants

EN
The digital reading Room of the Ministry of Defence of the Czech Republic: using technology for advanced indexing of historical documents

Languages of publication

CS

Abstracts

EN
Developments in information technology and artificial intelligence are providing tools that have considerable potential to facilitate and enrich research in the fields of history and related sciences. A prerequisite for their effective use, however, is the most perfect conversion of analogue historical sources into machine-readable form, so that the search, classification and extraction of the information contained in them is as efficient as in born-digital sources. In their study, Kykal and Fišer first provide an overview of the development of digital libraries and the making available of the results of digitization in the Czech Republic, taking into account the different strategies and technological backgrounds of libraries and archives. They reflect on the limitations of full-text search and point out a surprising systemic deficit in current digital libraries, namely the absence of the diagnostics of the quality of machine transcription performed by Optical Character Recognition (OCR) programs. They then pay special attention to presenting the parameters and possibilities of the Digital Reading Room of the Ministry of Defence of the Czech Republic (Digitální studovna Ministerstva obrany ČR, DSMO), which is based on the Kramerius Digital Library system. Thanks to its role as an aggregator of the digitization production of the memory institutions of the Ministry of Defence, the Reading Room makes available both library documents and digitized items from archive collections and museum collections. Using the example of a printed periodical of the Austro-Hungarian Army from the First World War, the process of the additional enhancement of OCR results using the PERO tool (Czech abbreviation for pokročilá extrakce a rozpoznávání obsahu - Advanced Extraction and Recognition of Content) is presented, including enrichment with a metadata scheme which captures the layout of graphic and text objects (Analysed Layout and Text Objects, ALTO) and allows the precise localization of the searched text on the digitized image. Using this program, the textual content of not only printed or typewritten texts, but also handwrit­ten texts, can be retrieved much more efficiently and with noticeably higher quality. Moreover, the data in the ALTO scheme could be used to automatically monitor the quality of OCR results. This procedure would significantly increase the usability of semantic search, machine translation, summarization and many other artificial intelligence tools that are yet to be fully deployed in the Czech Digital Library environment.

Discipline

Year

Volume

31

Issue

2

Pages

447-467

Physical description

Document type

ARTICLE

Contributors

author
  • Soudobé dějiny, redakce, Ústav pro soudobé dějiny AV ČR, v.v.i., Vlašská 9, 118 40 Praha 1, Czech Republic
  • Soudobé dějiny, redakce, Ústav pro soudobé dějiny AV ČR, v.v.i., Vlašská 9, 118 40 Praha 1, Czech Republic

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.81143b35-229d-408e-a4b3-b94e38a3f633
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.