Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Refine search results

Journals help
Authors help
Years help

Results found: 98

first rewind previous Page / 5 next fast forward last

Search results

Search:
in the keywords:  corpus linguistics
help Sort By:

help Limit search:
first rewind previous Page / 5 next fast forward last
EN
This paper deals with a corpus-based linguistic study in lexical semantics. Our topic is the general scientific lexicon, the cross-disciplinary lexicon peculiar to the academic genre. We show how the use of a large corpus enables to develop an inventory of this vocabulary and present the first semantic treatments performed with the help of the corpus, with a first experiment in natural language processing..
EN
Strelica je pogodila grješnika – a certain issue of the Croatian orthography in light of the Croatian National Corpus dataBased on the Croatian National Corpus data the author presents inconsistencies in the spelling rules regarding words like str(j)elica, gr(j)ešnik which are described in Croatian orthographic dictionaries. The paper addresses also the discrepancies between orthographic norm and how it is reflected in the real Croatian texts, as well as the ideological reasons for these differences.
Mäetagused
|
2017
|
vol. 69
217-242
EN
Linking Estonian linguistic proficiency to reference levels of the CEFR and different educational stages does not rely on research but is based on deep-rooted perceptions. More veracious data can be obtained by comparing a native speaker’s language usage patterns to morphological and lexical preferences characteristic to speakers of every language level. For this purpose, tools for automatic text processing (which are mainly created on the basis of English) and different techniques for data analysis are needed. The article introduces an original computer program called Cluster Catcher that has been developed in the Tallinn University for finding usage patterns from Estonian written language texts.
EN
Based on a mega corpus, The Corpus of Contemporary American English (COCA), this study aims to determine the most frequent adjectives used in academic texts and to investigate whether these adjectives differ in frequency and function in social sciences, technology, and medical sciences. It also identifies evaluative adjectives from a list of a hundred most frequently used adjectives. A total of 839 adjectives, which comprises the list of frequently used adjectives in COCA, were searched using a search engine. 334 of the adjectives were found to appear more frequently in the academic sub-corpus than in other sub-corpora (spoken, fiction, magazine, and newspaper). There was only one adjective that was used more frequently in technology and medical sciences than in social sciences. Some adjectives were very dominant in a specific discipline of academic texts. The frequency of evaluative adjectives in most frequently used 100 adjectives was also listed. It is found that almost 40% percent of the adjectives are evaluative. The results of the study were discussed in terms of frequency effects in language learning and writing in the foreign language as providing learners with corpus data may improve language knowledge and the correct use of adjectives.
EN
Advances in spoken corpora analysis have brought about new insights into language pedagogy and have led to an awareness of the characteristics of spoken language. Current findings have shown that grammar of spoken language is different from written language. However, most listening and speaking materials are concocted based on written grammar and lack core spoken language features. The aim of the present study was to explore the question whether awareness of spoken grammar features could affect learners’ comprehension of real-life conversations. To this end, 45 university students in two intact classes participated in a listening course employing corpus-based materials. The instruction of the spoken grammar features to the experimental group was done overtly through awareness raising tasks, whereas the control group, though exposed to the same materials, was not provided with such tasks for learning the features. The results of the independent samples t tests revealed that the learners in the experimental group comprehended everyday conversations much better than those in the control group. Additionally, the highly positive views of spoken grammar held by the learners, which was elicited by means of a retrospective questionnaire, were generally comparable to those reported in the literature.
EN
This paper deals with debates about political correctness as they can be observed in comment sections of the website “Zeit Online”. Under articles on the topic of political correctness, numerous critical comments can be found which are then in turn reacted to with counter speech. On the basis of a corpus of 4791 comments of nine articles, in which the thread structures are also marked up, typical linguistic features of counter speech which are summarized as characteristics of counterness, are determined with quantitative corpus linguistic methods. In qualitative fine analyses, selected findings are further enriched. It will be shown that epistemic positioning, i.e., the indexing of one’s own and other people’s knowledge, and the associated acts of demarcation play an important role in the articulation of counter speech.
EN
Advances in spoken corpora analysis have brought about new insights into language pedagogy and have led to an awareness of the characteristics of spoken language. Current findings have shown that grammar of spoken language is different from written language. However, most listening and speaking materials are concocted based on written grammar and lack core spoken language features. The aim of the present study was to explore the question whether awareness of spoken grammar features could affect learners’ comprehension of real-life conversations. To this end, 45 university students in two intact classes participated in a listening course employing corpus-based materials. The instruction of the spoken grammar features to the experimental group was done overtly through awareness raising tasks, whereas the control group, though exposed to the same materials, was not provided with such tasks for learning the features. The results of the independent samples t tests revealed that the learners in the experimental group comprehended everyday conversations much better than those in the control group. Additionally, the highly positive views of spoken grammar held by the learners, which was elicited by means of a retrospective questionnaire, were generally comparable to those reported in the literature.
8
Content available remote

Zdvojená slovesa v současné češtině

80%
EN
This paper presents an analysis of so-called double-paradigm verbs (muset – musit, bydlet – bydlit, myslet – myslit, šílet – šílit, kvílet – kvílit and hanět – hanit) in contemporary Czech which is based on data from two Czech corpora: SYN2010 and SYN2009PUB. There is a common assumption in the literature that these verbs are classified as having two distinct paradigms: a “prosit- paradigm” and a “sázet-paradigm” (or in some cases a “trpět-paradigm”). The analysis shows that this assumption is false for contemporary Czech. It is shown that these verbs behave differently: muset, kvílet and šílet are used according to the “sázet-paradigm”, myslet and bydlet according to “trpět-paradigm” and the verb hanět is even more specific (present forms are used according to the “prosit-paradigm” and infinitive forms vary between the usage of stem suffix -e- and -i-). It is thus demonstrated that these verbs do not form a distinct category in contemporary Czech.
EN
This paper will present a corpus-based study on the translated language of tourism, focusing in particular on the stylistics of tourist landscapes. Through a comparative analysis of a specifically designed corpus of travel articles originally written in English (namely the TourEC-Tourism English Corpus) and a corpus of tourist texts translated from a variety of languages into English (namely the T-TourEC – Translational Tourism English Corpus), the study will investigate a selection of collocates, concordances and keywords related to the description and representation of tourist settings in both corpora. The aim will be that of identifying differences, aspects or practices to be potentially improved that characterize the translated language of tourism with respect to tourist texts originally written in English. Results will show that the discursive patterns of translated texts differ from the stylistic strategies typically employed in native English for the linguistic representation of landscape and settings due to phenomena of translation universals, and that these differences may affect the relating communicative functions, properties and persuasive effects of tourist promotional discourse.
10
Publication available in full text mode
Content available

Uniwersalia przekładowe

71%
EN
Synaesthesia turns out to thus be a strategy for linguistic pleasure, representing a somatic impulse to engage with texts. Barthes, Nabokov and Robinson, daring to reveal their scandalously pleasurable literary habits, point to synaesthetic engagement with language as the source of translators’ intuitions, readers’ sensitivities, as well as – inseparably – textual pleasures, understood as an integral component of the experiential dimension of lecture and translation.
PL
Artykuł przedstawia zagadnienie tzw. uniwersaliów przekładowych, które pojawiło się w związku z rozwojem lingwistyki korpusowej. Wysunięta przez Monę Baker hipoteza o istnieniu takich uniwersaliów wywołuje kontrowersje wśród badaczy zjawisk przekładowych, co również zostało w artykule pokrótce zreferowane.
EN
The Upper Sorbian text corpus and further sources of information with regard to Upper Sorbian in the InternetIn the present era of globalisation and the omnipresence of the Internet, Sorbian linguistics faces new challenges along the lines “What is not in the Internet, does not exist”. The demand for digital sources of information with regard to Upper and Lower Sorbian and those accessible online as working tools and reference points for language practice and as a source for academic research increases. As a result of this ongoing development, the Foundation for the Sorbian People established a workgroup called “Sorbian in the new media” at the end of 2012, which has pointed out the creation of an online German­Upper Sorbian dictionary as the major task in this field of activities. The focus of this article, however, is the Upper-­Sorbian text corpus HoTKo, which has been created by the Sorbian Institute and which has been made available in co-­operation with the Institute of the Czech National Corpus at the Charles University in Prague. The article presents the history and development of the corpus, its extent and shape as well as its link to or incorporation into further planned digital projects of the Sorbian Institute with regard to the Upper Sorbian language.
12
70%
EN
This paper presents a comparison of the largest contemporary corpus of spoken Czech ORAL2013 and a different source, data gathered in the project “Sociolinguistic Analysis of the Use of Prothetic v- in Bohemia” (SAUP). Both of these data sources consist of informal interviews with Czech speakers, but their design is different. ORAL2013 is based on shorter recordings of many speakers whereas the SAUP data is based on longer recordings of fewer speakers. It is assumed that these two data sources should yield similar results since they aim to represent the same population. The comparison is based on the use of two features of spoken Czech in the Bohemia region: prothetic v- and conditional verb forms bych/bysem and bychom/bysme. Based on the analysis, it is concluded that (1) more information about the speakers should be added to future corpora like ORAL2013; (2) the corpus ORAL2013 is useful to conduct a sociolinguistic pilot study which then should be followed by a full-scale research project based on a different sample constructed strictly for the purposes of the particular research; (3) the ratio between the number of speakers in the corpus and the amount of their speech is an important (and often underestimated) aspect of corpus design which should be given careful consideration.
EN
Formulaic competence is a hotly debated issue in teaching circles, not only because of its role in L2 communication but also due to the inherent complexity of the identification criteria for formulaic strings. While the mixed approach, combining meaning-based and corpus- based identification measures, remains a natural solution, the subjective character of the criteria, together with the required involvement of native experts, diminishes its attractiveness for every-day pedagogical purposes. We would like to explore the potential of “corpus-only” identification tools. Specifically, our objective is to show that meaningless n-grams (of the, in a, etc.) generated by frequency searches contain useful pedagogical data, and that, coupled with MI scores frequency-based measures accurately characterize learners’ formulaic competence. Because of the relative simplicity of the identification procedure and free availability of corpus tools, frequency-based and distribution-based measures may become an important new pedagogical tool at the disposal of language teachers
EN
In the present paper we examine the extent to which age, gender, and education affect the use of the Spisz regional dialect. It is widely assumed that only elderly speakers use pure dialect with no influences of the standard variety of Polish, whereas other generations mix the dialectal with the standard grammar. The data are drawn from the Spisz Corpus. Eight features were chosen, six of them pertaining to inflection, two others to syntax. Though the number of non-dialectal features increases with each generation, it remains, however, quite limited. Still, this is not true in the case of the syntactic idiosyncrasies of the regional dialect, which are almost entirely abandoned by younger generations. Also, women are more prone to use dialectal forms compared to men. Finally, the higher the education of the speaker, the higher the amount of non-dialectal forms, again with the notable exception of academic degree holders, who master code-switching better. In general, however, the Spisz regional dialect is well-preserved by its speakers.
EN
This paper analyses the adverbs certainly and generally as stancetaking markers. These adverbial devices are said to show authorial stance and to communicate the author’s commitment or detachment towards the information presented, and so they are classified as epistemic adverbs (Alonso-Almeida 2015). For this study, I have selected a corpus of history texts from the Modern English period (1700-1900), as compiled in The Corpus of History English Texts (Crespo and Moskowich 2015), on the basis of which the two evidential adverbs are examined using computer corpus tools, although manual inspection is also employed to assess the meaning of the items in context. The findings suggest that, in this type of scientific articles, the two adverbs are used with differing pragmatic functions, in the case of certainly it functions mostly as a booster and, in the specific case of generally, its use seems to primarily suggest a hedging purpose (Hyland 2005a).
EN
The present paper examines the construal of the verb myśleć ‘think’ in Polish from the perspective of Cognitive Grammar and Functional Linguistics. Cognitive corpus-driven and quantitative methodology (e.g., Glynn and Fischer 2010) is applied to reveal the formal and semantic correlations obtaining between a set of unprefixed and prefixed verb forms of myśleć ‘think’, instantiating and profiling various aspects of the category in question. The quantitative configurational method (Geeraerts et al. 1994) reveals the “behavioral profiles” (Gries 2006) of the verb, based on the “usage features” (Glynn 2009) associated with it. The notion of subjective and objective construal, as developed by Langacker (1990, 2006), is further elaborated on by more functional dimensions of perspective-taking, as put forward by Nuyts (2001), Verhagen (2008) and Traugott (1995, 2010).
EN
The paper reports on the results of an examination of changes in Polish lexis over the past decade. Two different, multi-million corpora spanning the years 2011–2022 were contrasted with a subset of the balanced National Corpus of Polish, which covers the period until 2010. To this end, keyword analysis was employed, and words that are particularly characteristic of the more recent set of texts, compared to the older corpus, were automatically extracted. This allowed us to identify the most salient lexical trends which differentiate the language of the last decade from the one recorded in the National Corpus of Polish, and which point to significant extralinguistic socio-cultural, economic, and political shifts across time.
EN
This article examines the discursive construction of Scottish and British-English national identities in the printed press within the context of the planned Scottish independence referendum. Using Critical Discourse Analysis and informed by sociological and anthropological research, the study uses a Corpus Linguistics approach to analyse newspaper texts from the Scottish and British printed media to define the strategies used in the construction and disarticulation of these identities and the ideologies behind them. The results of the analysis will show that the Scottish broadsheets use a staunchly Scottish rhetoric with frequent examples of nation flagging, showing the palpable struggle for power and a certain sense of inferiority. Inadvertently or otherwise, these newspapers engender a sense of separateness by employing techniques of positive in-group identification. The Scottish editions of UK broadsheets, on the contrary, hold a more Anglocentric perspective and their treatment of the referendum is more political than ideological, frequently attributing negative evaluations to the independence issue and engaging in the practice of "tartanisation". To conclude, the UK broadsheets tend to provide a more balanced and objective point of view, thus being at the political centre of the social debate enacted by the referendum and the subsequent possible independence of Scotland.
EN
Based on my earlier work on the conceptualization of emotions, I wish to emphasize a number of points in this paper. First, I suggest that emotion concepts are largely metaphorical and metonymic in nature. Second, I propose that several of the conceptual metaphors and metonymies are tightly connected. Third, in line with a large body of recent result, I maintain that many of our emotion concepts have a bodily basis, i.e. that they are embodied. Fourth, I concur with many others that our emotion concepts can be seen to have a frame-like structure, i.e. that they can be represented as cognitive-cultural models in the mind. Fifth, and on the methodology side, I claim that the description and analysis of emotion concepts requires both a qualitative and a quantitative methodology. Though most of these suggestions have been accepted and embraced by a number of scholars working on the emotions, several other scholars have challenged the suggestions. As a response to such challenges, I have revised and modified the ideas above in the past 25 years. The present paper is concerned with these more recent developments.
first rewind previous Page / 5 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.