Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 3

first rewind previous Page / 1 next fast forward last

Search results

help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
In this paper, basic statistical data about the Slovak linguistic units (lemmas, word forms, collocations, parts of speech or rather word classes) are analysed using various language resources. The frequency of lemmas and word forms in contemporary Slovak according to the dictionary Frekvencia slov v slovenčine (Frequency of Words in the Slovak Language) by Jozef Mistrík (1969) is compared with the data obtained from several Slovak National Corpus versions and some of its subcorpora. This provides an overview about the stability and dynamics of the Slovak language system in various areas of its usage during the last fifty-five years. Statistical information summarized in the frequency dictionaries and lists helps to understand better and more objectively the functioning of the linguistic units in communication. It also helps to determine the attributes of both typologically distant and close languages. This contribution serves as a demonstration of the possibilities of statistical analysis and it will be used as a base for preparation of a new frequency dictionary of Slovak based on the Slovak National Corpus material.
EN
In the early 1990s the field of corpus linguistics was not involved within the Slovak linguistics. J. Mistrík (Encyklopédia jazykovedy, 1993) predicted a substantial influence of various factors on the Slovak language and its research. Factors such as computerization, development of corpora and language technologies have also influenced the formation of the Slovak National Corpus. Internal conditions – necessity of material for language research and compiling a new monolingual dictionary of Slovak have also great impact on its establishment. The Slovak National Corpus was founded in 2002 with the support of the Ministry of Education, the Ministry of Culture and the Slovak Academy of Sciences. SNC comprises of several projects primarily focused on the linguistic research and language teaching. Currently, the corpus prim-6.1 contains 829 million tokens. The documents – texts in the corpus keep rich metadata description, including detailed style and genre annotation, and morphological annotation. There are many related corpora, e.g. manually morphologically annotated corpus, Slovak WebCorpus, Corpus of legal texts, Corpus of Spoken Slovak, several parallel corpora (Slovak-French, Slovak-Russian, Slovak-Czech, Slovak-English, Slovak-Latin). Separate projects are the Slovak morphology analyser, Slovak Terminology Database, Slovak WordNet, Corpus of Historical Slovak. The Slovak language resources are quite sufficient for basic language research, but NLP for Slovak requires further support.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.