Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 3

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  morphological tagging
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
The paper describes the principles and structure of the one-million-word DIA1900 Corpus built at the Institute of the Czech National Corpus (CNC) in Prague, focused on the language of Czech texts published in the years 1851 to 1900. The DIA1900, planned for publication by June 2020 and to be followed by the DIA1850 (a corpus built around the same principles, with the focus on the first half of the 19th century), observes both the balanced representation of the three major text types (belles lettres — journalistic texts — technical/scientific texts) and the system of morphological tagging implemented in the synchronic corpora included in the CNC project, thus facilitating the diachronic comparison of two stages in the development of Czech. A brief description is given of the structure of the morphological terminology used in the lemmatisation and tagging of the corpus, and of two tools designed to help search the 19th century texts with their fluctuating orthographic consistency combined with phonological and morphological variation characteristics of the language of the period: (1) a multiple select/suggest feature (reminding the user of the existence of non-standard orthographic and phonological variants of the lemma found in the corpus before the lemma search is started) and (2) the position attribute (informing the user of the ambiguous status of a word in the text, resulting from a misprint or misspelling, damaged page etc.).
2
100%
EN
The deep learning methods of artificial neural networks have seen a significant uptake in recent years, and have succeeded in overcoming and advancing the success of auto-solving tasks in many fields. The field of computational linguistics and its application offshoot, natural language processing, with classic tasks such as morphological tagging, dependency analysis, named entity recognition and machine translation, are no exception to this. This paper provides an overview of recent advances in these tasks related to the Czech language and presents completely new results in the areas of morphological marking and recognition of named entities in Czech, along with a detailed error analysis.
EN
Until recently, the full form of the n-/t-participle was tagged in the Czech National Corpus as a common adjective. Only with the new corpus SYN2020 a special tag was introduced. This allows for research the role of both the short and the full form of the n-/t-participle with resultatives in written Standard Czech texts. The results show that the full form of the participle has in most contexts a significantly higher frequency than the short form. The only excerptions are subject and object resultatives without subject (Je zataženo / Je otevřeno) and possessive resultatives without object (Mají zavřeno), both with the participle in the neuter singular form. In these cases the full forms seldom occur in actual written Czech texts. The use of the new tag in other corpora than SYN2020 will allow for better research of full forms of the n-/t-participle in Czech, not only in resultative constructions.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.