Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 5

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  diachronic corpus
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
The use of terms in discourse is characterized by the presence of definitions aimed at explaining and clarifying their meaning in a given field, both for specialists and for non-specialists. This defining activity associates with terminological units, thus considered in their textual dimension, over time since their appearance in the language as neologisms. Although the question of definitions can be explored from different angles for the general language and for specialized language, in this paper we are interested in the automatic identification and analysis of definitions in a diachronic corpus. We intend to reflect on the data offered by these definitions with the objective of offering a terminographic description focused on the diachronic aspect of terms. After automatic identification, we will observe the definitions in two phases of terms: the moment of their appearance in the French language and the moment of their dissemination and lexicalization. Our attention is focused on the definitions characterizing the French terms in the commercial field in a diachronic perspective, within the DIACOM corpus.
EN
The paper discusses what kind of content and annotation should be included in the diachronic corpus of Old Czech. Based on his analysis of the current state of DIAKORP and the Old Czech Text Bank the author suggests solutions for how to treat the critical apparatus, foreign words in historical Czech texts and contemporaneous or later marginal or interlinear notes. He also discusses some aspects of the methodology of statistics computation in the diachronic corpus.
EN
The paper reviews the present state of the diachronic part of the Czech National Corpus, with the focus on the two-million-word unannotated pivotal corpus Diakorp and its limitations in relation to corpus-based research into the history of Czech. A minimum 1,000,000-token growth, lemmatization and morphological tagging are cited as near-future enhancements to the corpus. A series of thoroughly structured monitoring diachronic corpora to be built from 2017 on is considered as a future basis for research into long-term trends in the history of Czech, thus complementing the quantity-oriented Diakorp.
EN
The paper describes the principles and structure of the one-million-word DIA1900 Corpus built at the Institute of the Czech National Corpus (CNC) in Prague, focused on the language of Czech texts published in the years 1851 to 1900. The DIA1900, planned for publication by June 2020 and to be followed by the DIA1850 (a corpus built around the same principles, with the focus on the first half of the 19th century), observes both the balanced representation of the three major text types (belles lettres — journalistic texts — technical/scientific texts) and the system of morphological tagging implemented in the synchronic corpora included in the CNC project, thus facilitating the diachronic comparison of two stages in the development of Czech. A brief description is given of the structure of the morphological terminology used in the lemmatisation and tagging of the corpus, and of two tools designed to help search the 19th century texts with their fluctuating orthographic consistency combined with phonological and morphological variation characteristics of the language of the period: (1) a multiple select/suggest feature (reminding the user of the existence of non-standard orthographic and phonological variants of the lemma found in the corpus before the lemma search is started) and (2) the position attribute (informing the user of the ambiguous status of a word in the text, resulting from a misprint or misspelling, damaged page etc.).
EN
The objective of the paper is to describe the principles for building the onemillionword DIA1900 Corpus consisting of Czech texts published between 1851 and 1900, designed to be both balanced and representative. There are two main goals determining the methods of corpus building and the decision to develop new tools tailored to the special needs of 19th century Czech: 1) to present the variability of Czech in the 2nd half of the 19th century (including spelling, morphology, wordformation) and 2) to link the detected variants to the appropriate lemmas. The paper presents the phases of the processing of the texts, including transcription, manual pre-annotation, as well as the construction of a large morphological dictionary and the selection of a suitable set of paradigms. Further sections are focused on annotation and morphological tagging and manual disambiguation. The objective was to create a gold standard, intended for use in the automatic annotation both of the DIA1900 corpus and the planned corpus of Czech texts of the years 1800–1850.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.