Results found: 2

Search results

Search:
in the keywords: morphological annotation

Sort By:

Limit search:

Změny v morfologické anotaci korpusů řady SYN: nové možnosti zkoumání české gramatiky a lexikonu

100%

Křivan J., Šindlerová J.

Slovo a slovesnost: časopis pro otázky teorie a kultury jazyka (Slovo a slovesnost: A journal for the theory of language and language cultivation)

2022

vol. 83

issue 2

122-145

This paper introduces some major conceptual enhancements to the morphological annotation of the SYN series corpora of the Czech National Corpus. Apart from minor changes in tokenization and in the positional tagset, three major conceptual changes have been applied which affect the representation of various lexical and grammatical patterns. In the paper, we present the actual impact of the changes in linguistic data and search for possibilities in three linguistic areas. First, the treatment of phonic, graphemic, and morphological variants via a two-tier lemma structure is discussed; second, a new approach to periphrastic verb forms, auxiliaries, participles and the interpretation of verbal grammatical categories through a new attribute, called verbtag, is explained; and third, a complex multi-value treatment of multiword tokens is introduced.

Korpus DIA1900: jeho koncepce a vytváření

80%

Benešová L., Kučera K., Najbrtová K., Pivoňková K., Stluka M.

Časopis pro moderní filologii (Journal for Modern Philology)

2023

vol. 105

issue 1

121-140

The objective of the paper is to describe the principles for building the onemillionword DIA1900 Corpus consisting of Czech texts published between 1851 and 1900, designed to be both balanced and representative. There are two main goals determining the methods of corpus building and the decision to develop new tools tailored to the special needs of 19th century Czech: 1) to present the variability of Czech in the 2nd half of the 19th century (including spelling, morphology, wordformation) and 2) to link the detected variants to the appropriate lemmas. The paper presents the phases of the processing of the texts, including transcription, manual pre-annotation, as well as the construction of a large morphological dictionary and the selection of a suitable set of paradigms. Further sections are focused on annotation and morphological tagging and manual disambiguation. The objective was to create a gold standard, intended for use in the automatic annotation both of the DIA1900 corpus and the planned corpus of Czech texts of the years 1800–1850.

Refine search results

1 Slovo a slovesnost: časopis pro otázky teorie a kultury jazyka (Slovo a slovesnost: A journal for the theory of language and language cultivation)

1 Časopis pro moderní filologii (Journal for Modern Philology)

1 Benešová L.

1 Kučera K.

1 Křivan J.

1 Najbrtová K.

1 Pivoňková K.

1 Stluka M.

1 Šindlerová J.

1 2023

1 2022

Search results

Změny v morfologické anotaci korpusů řady SYN: nové možnosti zkoumání české gramatiky a lexikonu

Korpus DIA1900: jeho koncepce a vytváření