PL EN


2011 | 72 | 4 | 268-287
Article title

Syntaktická proměna Českého akademického korpusu

Content
Title variants
EN
THE SYNTACTIC TRANSFORMATION OF THE CZECH ACADEMIC CORPUS
Languages of publication
CS
Abstracts
EN
The idea of the Czech Academic Corpus (CAC) came to life in 1971 thanks to the Department of Mathematical Linguistics within the Czech Language Institute. By the mid 1980s, a total of 540,000 words were morphologically and syntactically annotated manually. After the Prague Dependency Treebank (PDT) – the largest annotated treebank of Czech written texts – was built, the conversion from CAC to PDT format began. The main goal was to make the CAC and the PDT compatible, and thus to enable the integration of the CAC into the PDT. The second version of the CAC is thus a complete conversion of the internal format and annotation schemes. The conversion of syntactic annotation began three years after the syntactic annotation of PDT was finished. Such a situation is exceptional because, to our knowledge, there is no other language for which such a significant amount of data is being annotated in two subsequent projects. This article summarizes the experience acquired during the conversion of the CAC syntactic annotation.
Contributors
author
  • ÚFAL MFF UK, Malostranské nám. 25, Praha 1, 118 00, Czech Republic
References
Document Type
Publication order reference
Identifiers
YADDA identifier
bwmeta1.element.cejsh-357cf6f5-bb9a-4d47-ae2d-e2e1da03dde2
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.