SLOVENKÝ NÁRODNÝ KORPUS A KORPUSOVÁ LINGVISTIKA NA SLOVENSKU PO ROKU 2002
Slovak National Corpus and corpus linguistics in Slovakia after 2002
Languages of publication
In the early 1990s the field of corpus linguistics was not involved within the Slovak linguistics. J. Mistrík (Encyklopédia jazykovedy, 1993) predicted a substantial influence of various factors on the Slovak language and its research. Factors such as computerization, development of corpora and language technologies have also influenced the formation of the Slovak National Corpus. Internal conditions – necessity of material for language research and compiling a new monolingual dictionary of Slovak have also great impact on its establishment. The Slovak National Corpus was founded in 2002 with the support of the Ministry of Education, the Ministry of Culture and the Slovak Academy of Sciences. SNC comprises of several projects primarily focused on the linguistic research and language teaching. Currently, the corpus prim-6.1 contains 829 million tokens. The documents – texts in the corpus keep rich metadata description, including detailed style and genre annotation, and morphological annotation. There are many related corpora, e.g. manually morphologically annotated corpus, Slovak WebCorpus, Corpus of legal texts, Corpus of Spoken Slovak, several parallel corpora (Slovak-French, Slovak-Russian, Slovak-Czech, Slovak-English, Slovak-Latin). Separate projects are the Slovak morphology analyser, Slovak Terminology Database, Slovak WordNet, Corpus of Historical Slovak. The Slovak language resources are quite sufficient for basic language research, but NLP for Slovak requires further support.
354 - 367
Publication order reference