Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 3

first rewind previous Page / 1 next fast forward last

Search results

help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
In the humanities, analysis of primary and secondary literature is an important area of research work. Besides language corpora, digital libraries, which digitized approximately 98.7 million pages in the Czech Republic between 1992 and 2022, can be considered a suitable source of written texts in recent years. The article presents an example from abroad and gives a brief overview of data sources in the Czech environment. It focuses on the recently completed DL4DH project, which aims to offer researchers access to large volumes of data from the Kramerius digital library in standardized formats (plain text, ALTO, CSV/TSV, TEI, JSON) not only through a new web application but also through a REST API. To make the subsequent analysis of the publications as easy as possible, the downloaded data can include enrichment data from the UDPipe and NameTag tools developed and operated by the LINDAT/CLARIAH-CZ research infrastructure.
EN
The paper discusses what kind of content and annotation should be included in the diachronic corpus of Old Czech. Based on his analysis of the current state of DIAKORP and the Old Czech Text Bank the author suggests solutions for how to treat the critical apparatus, foreign words in historical Czech texts and contemporaneous or later marginal or interlinear notes. He also discusses some aspects of the methodology of statistics computation in the diachronic corpus.
EN
This paper introduces the description of Old Czech common nouns developed and used in a tool for tagging and lemmatizing common nouns occurring in transcribed digital editions of Old Czech texts. This description consists of four parts: the first features an overview of all declension type endings (approx. 100 declension patterns), the second part analyses alternations in the morphological basis accompanying declension (approx. 120 types of alternations), the third part deals with formal changes connected mainly with the language’s historical development (approx. 100 formal changes) and, finally, the fourth part contains a list of lemmas extracted from modern dictionaries of Old Czech (approx. 29 000 lemmas). Furthermore, the paper introduces the software developed and used for this purpose, namely i) the tool which makes it possible a) to generate word forms and subsequently search for multiple word forms in the texts at once, b) to create lists of word forms filtered by sequences of characters occurring at the end of the word forms, ii) the tool for assigning a declension pattern to a lemma, and iii) the tool enabling work with large databases. Finally, the paper describes two applications developed on the basis of Old Czech common noun description, i.e. i) a database of Old Czech common noun declension patterns connected with Old Czech dictionaries and the Old Czech text bank, ii) a tool for generating word forms, which is used for the lemmatization and tagging of Old Czech texts.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.