Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 3

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  lemmatisation
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
The aim of this paper is to provide a corpus-based analysis of one type of Czech proper nouns (type Zubří). We will argue that the adequate annotation (lemmatisation and morphological tagging) of proper nouns type Zubří depends on several circumstances: 1) the coverage of the dictionary of the automatic analyser; 2) the accurate description of the variability of inflexion forms; 3) the non-trivial disambiguation of numerous homonymous word forms. We believe that while meeting the first two conditions is possible, the adequate disambiguation goes beyond the possibilities of automatic morphological analysis.
EN
The paper describes the principles and structure of the one-million-word DIA1900 Corpus built at the Institute of the Czech National Corpus (CNC) in Prague, focused on the language of Czech texts published in the years 1851 to 1900. The DIA1900, planned for publication by June 2020 and to be followed by the DIA1850 (a corpus built around the same principles, with the focus on the first half of the 19th century), observes both the balanced representation of the three major text types (belles lettres — journalistic texts — technical/scientific texts) and the system of morphological tagging implemented in the synchronic corpora included in the CNC project, thus facilitating the diachronic comparison of two stages in the development of Czech. A brief description is given of the structure of the morphological terminology used in the lemmatisation and tagging of the corpus, and of two tools designed to help search the 19th century texts with their fluctuating orthographic consistency combined with phonological and morphological variation characteristics of the language of the period: (1) a multiple select/suggest feature (reminding the user of the existence of non-standard orthographic and phonological variants of the lemma found in the corpus before the lemma search is started) and (2) the position attribute (informing the user of the ambiguous status of a word in the text, resulting from a misprint or misspelling, damaged page etc.).
EN
This paper deals with the lexicographic description of phraseologic units in online dictionaries. Bilingual electronic dictionaries are not limited by space capacity, unlike paper dictionaries. Besides they offer the possibility of searching, selecting, and complementing lexicographic data in several ways, linking to other information sources such as electronic corpora. These factors influence the lonna! of the presentation of phrasemes and remove many problems with multiword expressions, which print dictionaries have to tackle. The discussion first involves the selection of phrasemes. IImakes sense to find fixed expressions useful for foreign language learners and distinguish between the ones relevant for the active usage and those for the reception. It could be taken into account by lexicographers to lemmatise phrasemes both as headwords of their own as well as under each component, as a part of their microstructure. The second question is what is the appropriate citation fonn to be entered. The current lexicographic practice tends to lemmatise vernal phrases in the infinitive fonn, however this may be confusing because of several morphological and syntactic restrictions. Therefore it seems mora adequate if verllal phrases occur in the fonn indicating the slots in the valency like jd. zieht jdn. Ober den Tisch. On the other hand the citation fonn should indicate grammatical, lexical and pragmatic variation by the means of typography and proper examples, e.g. from a corpus. Obligatory, facultative, and alternative components should be marked as well. The different lexicographic practices were discussed on the example of phrasemes with the component Tisch from bilingual Gennan-Polish dictionaries such as Pons, Leo, Bab.la, and Glosbe as well as two phraseologic dictionaries.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.