Proměna Českého akademického korpusu

Hladká, Barbora; Králík, Jan

Article details

Journal

Slovo a slovesnost: časopis pro otázky teorie a kultury jazyka (Slovo a slovesnost: A journal for the theory of language and language cultivation)

2006 | 67 | 3 | 179-194

Article title

Proměna Českého akademického korpusu

Authors

Barbora Hladká , Jan Králík

Content

Full texts:

http://kramerius.lib.cas.cz/search/handle/uuid:b6d852e5-ea0a-44ce-8869-c14bcdffee15 [remote]

Title variants

EN

The transformation of the Czech Academic Corpus

Languages of publication

CS

Abstracts

EN

The Czech Academic Corpus was created during the 1970s and 1980s at the Czech Language Institute under the supervision of Marie Těšitelová. The main motivation to build it (a total of 540 thousand word tokens) was to obtain the quantitative characteristics of contemporary Czech. The corpus is structurally annotated on two levels – the morphological level and the syntactical-analytical level. The original stochastic experiments in morphological tagging of Czech were performed using the corpus at the beginning of the 1990s. Given this, the corpus-based processing of Czech was launched. At the end of 1990s, work on the Prague Dependency Treebank had started (independently from the corpus) and its first edition was published in 2001. In considering future released versions of the treebank, we have decided to convert the corpus into the treebank-like format. This article focuses on the twenty-year history of the Czech Academic Corpus. Special attention is devoted to thus far unpublished facts about the corpus annotation. The conversion steps resulting in the first version of the Czech Academic Corpus are described in detail.

Keywords

CS

anotovaný korpus konverze anotačního schématu zpracování přirozeného jazyka

EN

annotated corpus annotation scheme conversion natural language processing

Year

2006

Volume

67

Issue

3

Pages

179-194

Physical description

Document type

ARTICLE

Contributors

author

Barbora Hladká

Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic

author

Jan Králík

Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic

Article details

Journal

Slovo a slovesnost: časopis pro otázky teorie a kultury jazyka (Slovo a slovesnost: A journal for the theory of language and language cultivation)

Article title

Proměna Českého akademického korpusu

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Discipline

Publisher

Journal

Year

Volume

Issue

Pages

Physical description

Document type

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier