Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 15

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  CORPUS LINGUISTICS
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
A new specialized sub-corpus of the Slovak National Corpus with a free public access – the Corpus of Copywriting Texts (cw-2014-all) consisting of 1 648 229 tokens and 54 617 unique lemmas was created in the department of the Slovak National Corpus of the Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences in 2014. The corpus contains 1 441 pages from 339 websites of commercial companies and public institutions focused on advertising and self-presentation. The corpus is lemmatized and morphologically annotated. It includes three sub-corpora: the sub-corpus of copywriting texts of bigger commercial companies (cw-2014-v); the sub-corpus of copywriting texts of smaller commercial companies (cw-2014-m) and the sub-corpus of copywriting texts of public institutions (cw-2014-inst). The article specifies the methodology of corpus creation and provides its basic quantitative and qualitative characteristics.
2
Content available remote

Možnosti a meze korpusové lingvistiky

100%
EN
This paper addresses two most common comments on corpus linguistics: 1) a corpus is merely a card file index in electronic form and 2) corpus linguistics covers only corpora construction and linguistic marking. We argue that a corpus consists of much more complex material and it can be exploited in unprecedented ways. In response to the second question, we point out that corpus linguistics is an independent linguistic discipline with substantial contributions to linguistic theory and language description.
EN
Our reaction paper addresses the work of M. Šimková published in the last 2013 number of Slovak Speech that has covered the activities of Ľ. Štúr Institute of Linguistics’ Department of the Slovak National Corpus in the past 10 years. We complete the missing references, comment some points presented by the paper, and offer another view on the procedures involved in creation and morpho-syntactic annotation of the Slovak National Corpus.
Slavica Slovaca
|
2022
|
vol. 57
|
issue 2
149-155
EN
The present study considers semantic features of a specific group of lexical units of English origin – with the terminal suffix –ing – that are functioning in the contemporary Slovak language. The aim of our research is to investigate their semantic adaptation within contemporary journalistic genres. The research sample is to be studied in the framework of the journalistic texts corpus from the database of the Slovak National Corpus utilizing the search tool Sketch Engine.
Slavica Slovaca
|
2017
|
vol. 52
|
issue 2
110 - 121
EN
This paper explores the ways of displaying zhalost′ (‘pity’) in the Russian language in order to show how the analysed emotional concept varies in relation to the main values in some typical pity-situations.
EN
The aim of this paper is to examine language data with regard to potential differences between male and female registers. Corpus linguistics is used as a basic methodological approach (mostly quantitative) to the topic and the Hanku corpus, more particularly the sub-corpus Litchi, are used as the primary source of language data. The data are presented in the form of tables together with a brief analysis. The results indicate a considerable variation between male vs. female registers in some areas – lexicon, part-of-speech proportion etc. However in other areas (e.g. prosody) there exists no deviation at all. These indicators of variation will be subject of further, more detailed research.
EN
The Czech passive participle is often considered a bookish form whose usage is confined to written language, especially to technical and specialized literature. Czech speakers often use 'the long form' of the respective deverbative adjective instead of the passive participle and they do so ever more frequently, not only when speaking but often also when writing in Literary Czech. This situation has been the subject of discussion by Czech linguists for decades. The article presents the findings of a research based on the DIALOG corpus, a large linguistic corpus of contemporary spoken Czech containing transcripts of TV political debates and talk shows. The research reveals that the past participle forms are comparatively frequent in the corpus analysed, while, in contrast, the alternative forms of the long deverbative adjective used in a manner which can be classified as non-standard or non-literary are rare. The results also partly confirm that the passive voice of perfective verbs is considerably more common than that of imperfective verbs, while on the other hand the instances of imperfective passive forms found in the corpus show that their use is fully appropriate. The last section deals with the relation of the use of passive participles to code-switching between Literary and Colloquial Czech.
EN
We analyse reflexivity of Slovak verbs based on Slovak National Corpus. We derive a simple parameter describing the amount of reflexivity for a given verb, based on the distribution of the reflexive pronouns in the left and right context of the verb in the corpus, and apply the method to sort the verbs in the Slovak National Corpus according to the parameter. The method allows us to classify verbs automatically according to their reflexivity, given enough amounts of their occurrences in the corpus.
EN
In the early 1990s the field of corpus linguistics was not involved within the Slovak linguistics. J. Mistrík (Encyklopédia jazykovedy, 1993) predicted a substantial influence of various factors on the Slovak language and its research. Factors such as computerization, development of corpora and language technologies have also influenced the formation of the Slovak National Corpus. Internal conditions – necessity of material for language research and compiling a new monolingual dictionary of Slovak have also great impact on its establishment. The Slovak National Corpus was founded in 2002 with the support of the Ministry of Education, the Ministry of Culture and the Slovak Academy of Sciences. SNC comprises of several projects primarily focused on the linguistic research and language teaching. Currently, the corpus prim-6.1 contains 829 million tokens. The documents – texts in the corpus keep rich metadata description, including detailed style and genre annotation, and morphological annotation. There are many related corpora, e.g. manually morphologically annotated corpus, Slovak WebCorpus, Corpus of legal texts, Corpus of Spoken Slovak, several parallel corpora (Slovak-French, Slovak-Russian, Slovak-Czech, Slovak-English, Slovak-Latin). Separate projects are the Slovak morphology analyser, Slovak Terminology Database, Slovak WordNet, Corpus of Historical Slovak. The Slovak language resources are quite sufficient for basic language research, but NLP for Slovak requires further support.
Slavica Slovaca
|
2017
|
vol. 52
|
issue 1
16 - 26
EN
This publication explores the ways of displaying guilt in the Russian language conscience. According to the National Corpus of the Russian language guilt is defined in different contexts as cause and bad act as well as feeling. As a cause of something wrong it is related to coincidence, national mentality, curiosity, love and fate; as a bad act it is related to fault, adultery, alcoholism and crime; as a feeling of responsibility for such wrongs, whether real or imagined, it is related to pang of remorse, conscience, shame and fear. And we can estimate the degree of guilt, as such, only in regard to any social norm as moral or penal law. In this way, the analysed emotional concept is constant and variable at the same time: constant as an object of understanding, variable as a contextual meaning.
Bohemistyka
|
2015
|
vol. 15
|
issue 2
126 - 138
EN
The article describes the role which national corpora can and should play in creating the great Polish-Czech/ Czech-Polish dictionary. The article analyses functions of the language corpora in preparing single and multi-word dictionary entries in an e-dictionary. Concrete examples have been used for the purposes of the detailed analysis of polysemy and homonymy in the process of creating dictionary entries. Also, the methodology of monolingual lexicography (e.g. identifiers and definitions) is presented in the description of dictionary entries (words or phrasemes) in the dictionary. Particular focus is put on creating dictionary entry structure and using the national corpora. Some concrete examples have been used to illustrate selectivity and incompleteness in the description of lexical unit meanings found in dictionaries not based on the texts from the language corpora. The article emphasises the significance of language corpora in creating entries as they are a useful source of frequency data in creating dictionary entries. The article is based on the corpus data from the National Corpus of Polish (NKJP) and the Czech National Corpus (ČNK).
Bohemistyka
|
2014
|
vol. 14
|
issue 1
64 - 77
EN
This article presents the results of a study, the aim of which was finding and logically grouping verbs that retain the thematic vowel length preceding the determinant of the past tense in Czech. The research material is based on corpus SYN2010 of the Czech National Corpus. The ten most frequent verbs are included in the list of words examined by the author. Attention is also focused on the verbs with the lowest frequency, which the majority of Czech native speakers consider to be incorrect, or they don't know them.
EN
The adjective derived from the present active participle of imperfective verbs (called 'prechodnik' in Czech; e.g. the adjective 'delajici' derived from the imperfective participle 'delajic') is very common in present-day written Czech. On the contrary, the adjective derived from the present active participle of perfective verbs (e.g. 'udelajici' derived from 'udelajic') is extremely rare in written texts and perhaps non-existent for the native speaker of the Czech language. That is why all grammars of Czech either argue that this kind of verbal adjective does not exist in Czech or they do not mention it at all. In this paper, the author tries to show that this claim (or implicit assumption) is false. This type of adjective cannot be declared as non-existent because tens (and maybe hundreds) of different adjectives of the respective type (udelajici) are used in standard texts on the internet. It is also shown in this paper that this adjective cannot be rejected as non-systemic either. The last grammaticality criterion is valid as well: the general textual function of the verbal adjective is a nominalization of the potential attributive clause (e.g. 'vytvorici' for 'ktery vytvori' - creating (perfective) for who will create (perfective).
EN
The aim of the present paper is twofold. First, it aspires to outline the current state of linguistic comparison of German and Czech that has been significantly influenced in recent years by the increasing importance of corpus linguistics (very large corpora, parallel corpora). Most importantly, we argue that the theoretical framework of contrastive linguistics is to be rethought in the light of empirical data obtained from corpus analysis and, on the other hand, that corpus linguistics cannot be degraded to mere analytical methods without any theoretical background. In addition, we posit that German-Czech contrastive linguistics is in urgent need of a large learner corpus of texts produced by Czech learners of German and by German speaking learners of Czech respectively. Second, the paper presents a new online bibliography on German-Czech linguistic comparison, a project pursued at the Institute of Germanic Studies, Faculty of Arts, Charles University in Prague. The new bibliography capitalizes on the properties of the electronic medium that makes continual updates of its contents possible.
EN
The contribution describes which dictionaries and how they were treated in the Slovenská reč journal throughout its ninety years of existence. The theoretical, methodological and conceptual issues in lexicography that were addressed by the journal in the given time period are also considered. By analysing the papers published in the journal, the contribution aims to present the history and respectable figures of the Slovak dictionary-making. A great deal of attention, however, is given to explanatory and translational dictionaries of the 20th and 21st centuries, which historically evolved in contact with Czech and Russian (Soviet) schools of lexicography. Special dictionaries that cover lexis according to its functional attribute, the individual parameters of words, or developmental and territorial criteria are also in focus. Limited space is dedicated to corpus linguistics which in close collaboration with computational lexicography delivers material resources needed for the lexicographic work.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.