Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2011 | 72 | 4 | 268-286

Article title

Syntaktická proměna Českého akademického korpusu

Content

Title variants

EN
The syntactic transformation of the Czech Academic Corpus

Languages of publication

CS

Abstracts

EN
The idea of the Czech Academic Corpus (CAC) came to life in 1971 thanks to the Department of Mathematical Linguistics within the Czech Language Institute. By the mid 1980s, a total of 540,000 words were morphologically and syntactically annotated manually. After the Prague Dependency Treebank (PDT) – the largest annotated treebank of Czech written texts – was built, the conversion from CAC to PDT format began. The main goal was to make the CAC and the PDT compatible, and thus to enable the integration of the CAC into the PDT. The second version of the CAC is thus a complete conversion of the internal format and annotation schemes. The conversion of syntactic annotation began three years after the syntactic annotation of PDT was finished. Such a situation is exceptional because, to our knowledge, there is no other language for which such a significant amount of data is being annotated in two subsequent projects. This article summarizes the experience acquired during the conversion of the CAC syntactic annotation.

Contributors

  • Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic
  • Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic
  • Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.1ca9c933-73d7-423e-8b67-ac8429897f17
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.