Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 9

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  language corpora
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
Developing free morphological data for PolishA limiting factor in construction of Natural Language Processing (NLP) systems is often the availability of morphological resources. This indeed happens for Polish: the freely available corpus with manual morpho-syntactic annotation (part of the IPI PAN Corpus) is not coupled with any free morphological analyser. There exists a very large morphological dictionary of Polish available under a free licence – Morfologik. Unfortunately, its tagset differs significantly from the tagset of the corpus and, what is more, its morphological description lacks desired rigour. We amend this situation by performing a massive conversion of the dictionary into the tagset compliant with the corpus. The conversion results in a free dictionary containing entries for almost 3.5 million different word forms. In this article we report on our methodology, discuss some morphological and syntactic issues related to both tagsets and present the characteristics of the resulting dictionary.
EN
On the Benefits of Foreign Language Learning Based on Parallel Language CorpusA recently observed strong interest in language corpora, which can be defined as a collection of texts in an electronic format, as well as my work within the European Project Clarin on ‘The Parallel Polish-Bulgarian-Russian Corpus’ became the reason for writing the text concerning the use of the parallel language corpus for learning a foreign language. The article discusses the benefits resulting from the use of such a corpus in learning a foreign language, describes selected corpus language tools supporting the learning process as well as indicates some threats arising from the wrong use of the corpus.
EN
The purpose of this article is to analyse the resources of the National Photocorpus of the Polish Language (NFJP) in terms of presence of verbs of maximum and excessive effectiveness. The author endeavours to answer the following questions: 1) which verbs of maximum and excessive effectiveness are recorded by the NFJP, 2) to what extent the verbs recorded in the NFJP are present in the networks of entries in the 19th-century dictionaries of Polish, 3) whether words not recorded in dictionaries of Polish are present in the resources of the National Corpus of Polish (NKJP). The conducted examinations showed that 279 verbs of maximum and excessive effectiveness, including lexemes with prefi xes do-, na-, nad(e)-, o-||ob(e)-, prze-, roz(e)-, u-, wyand za- in their morphological structures and aspectual derivatives, can be found in the NFJP. The analysis evidences that over 15% of verbs have not been recorded in the examined dictionaries of Polish. However, out of 42 verbs, which are unique to the NFJP, 13 words have been found in the NKJP resources. The findings of the study lead to the conclusion that the NFJP could serve as a valuable source for the research on the 20th-century lexis of Polish, e.g. by complementing the knowledge of the vocabulary that is “distributed” in various types of texts and has not been covered by research to this date.
EN
The importance of stereotypical uses of language, especially in the area of collocational combinatorics, is decisive in language teaching and learning. This type of lexical relationship is difficult for non-native learners to acquire because of its complexity, not only in terms of lexical use, but also in terms of particular linguistic awareness. Learners’ collective corpora can be revealing in describing their transitional competence. The diagnostic of interlanguage specific difficulties makes it possible to evaluate the progression of a target language, to describe it, to identify its hegemonic variety and to create the most effective activities. In this article, we will discuss the issue of interlanguage in the learner corpora and language corpora for use in lexical learning in French as a foreign language lessons. We address questions that a learner of L2 French has, which are sometimes difficult to find answers to in scholarly grammars or L2 French workbooks.
EN
Contemporary Contrastive Studies of Polish, Bulgarian and Russian Neologisms versus Language CorporaIn the field of Slavonic linguistics contrastive studies of neologisms occupy little place, the newest words are insufficiently described and classified. The aim of this article is to draw attention to the need for contrastive description of the newest lexis and checking exclusively one of many possibilities of obtaining Polish, Bulgarian and Russian neologisms. Language corpora, as this possibility is in question, are not the only source from which the author obtains her research material, yet a growing interest in corpora has inspired her to also use this method. The author wants to show the reader to what degree language corpora can help in building the thesaurus of Polish, Bulgarian and Russian neologisms. Making an attempt to confront a collection of neologisms of contemporary Polish, Bulgarian and Russian language, the author points out the need to standardize the description (identical for each of the analysed languages), which she intends to propose in another publications on neologisms in Polish, Bulgarian and Russian language. The application of contrastive method to three different but related languages from the Slavonic group will help, in her opinion, to discover more mechanisms of new words coming into existence and examine the newest derivative processes and their productivity.
PL
Ilość i rozmaitość danych, z jakich może korzystać językoznawca dziś, jest nieporównanie większa niż w przeszłości. Nowe dane wpisują się w schemat znany od dawna: obejmuje on dane systemowe, meta-językowe, elicytacyjne, introspekcyjne, tekstowe, a także inne przekazy kulturowe istotne dla badań nad językiem, np. dane ikonograficzne. Nowe źródła danych i nowe narzędzia do ich analizy znacząco jednak wzbogacają warsztat badawczy językoznawcy. Pociąga to za sobą korzyści, ale też rodzi problemy, m.in. problem „nadmiaru” danych, których nie sposób zinterpretować tradycyjnymi metodami, oraz problem interpretacji wyników, gdy dane czerpane z różnych źródeł prowadzą do różnych wniosków. Obfitość danych, zwłaszcza cyfrowych, jest też wyzwaniem dla dydaktyki akademickiej. Niektóre umiejętności wymagane dziś do badań nad językiem daleko odbiegają od tradycyjnego modelu kształcenia filologów.
EN
The amount and variety of data that linguists have at their disposal are incomparably vaster now than in the past. New data fit into the schema known for a long time: this encompasses systemic data, metalinguistic data, elicitation data, introspective data, textual data, and other cultural messages relevant to the study of language, e.g. iconographic data. However, new data sources and new analytic tools significantly enrich linguistic research methodology. This brings benefits but also causes problems: the problem of data “overload” where traditional analytic techniques prove inadequate, and the problem of data interpretation in cases where data drawn from different sources yield different results. The abundance of data is also a challenge to academic training: some research skills now necessary in linguistic studies are far from the traditional model of philological training.
EN
The research undertaken aims to shed more light on the acquisition of the interjection oh on the basis of the Abigail files in the CHILDES database. The CHILDES database is a collection of transcripts of spoken interactions between the target child and his/her surroundings. The Abigail files comprise of data collected over four years, in which the child, Abigail was recorded at home at three-monthly intervals, a total of ten times. With regard to this paper, some approaches to in-terjections are sketched in the first part. Then, the general statistics of the use of oh are presented with reference to the functions it is used for by the participants of the interactions. Next, the most frequent three functions of oh are presented as calculated for the target child, the caregiver and a person of unknown age. Finally all functions of oh expressed in percentages with reference to the frequency of their occurrence are displayed.
PL
Artykuł stawia sobie za cel przeanalizowanie wykrzyknika oh na podstawie plików Abigail w bazie językowej CHILDES. Baza językowa CHILDES to zbiór transkryptów rozmów zapisanych w formacie CHAT. Pliki Abigail to seria rozmów dziecka z jego otoczeniem zarejestrowa-nych na przestrzeni czterech lat w trzymiesięcznych odstępach. W pierwszej kolejności artykuł prezentuje teoretyczne rozważania dotyczące wykrzykników. Kolejno przedstawiona zostanie ogólna statystyka użycia oh ze względu na funkcje, jakie pełni w wypowiedziach badanych uczestników interakcji. W następnej kolejności wskazane zostaną najczęstsze funkcje, jakie oh pełni w wypowiedziach badanego dziecka, jego opiekuna i osób trzecich. Na końcu przedstawione zostaną wszystkie funkcje, jakie oh pełni w wypowiedziach uczestników rozmów ze względu na ich częstotliwość.
EN
The paper deals with the development of specialized terminology in the field of cultural and creative industries. The first part, based on the Theory of Language Management and Critical Discourse Analysis, surveys how speakers in the field reflect the terminology and problems imposed by its use. The analysis focuses on a particular controversy in the nature of cultural and creative industries and its implementation in the Czech Republic (Gajdoš, 2010; Cikánek, 2011; Gajdoš, 2011). The second part scrutinizes the key collocation term kulturní a kreativní průmysly and its terminological variants. It investigates how the lexical components of the terms are used in Czech, what their common collocations are and what connotations they induce. The study shows how these properties affect the overall process of terminologization.
Bohemistyka
|
2017
|
issue 4
347-358
EN
This article concerns with pronouns ní, jí, naší and vaší in accusative singular form which is non-standard for this case. I try to point out a possible connection between pronunciation of these pronouns and their written forms in Czech National Corpus. I especially focus on reasons, why a language user (writer) chooses long final vowel in accusative singular also in contexts, where it is not in accordance with the paradigm of given pronoun, as it is described in grammars of Czech. I explore differences in function styles in the corpus SYN2015 and differences in regions in the corpus SCHOLA2010. I also observe, which prepositions are related to non-standard accusative pronouns.
CS
Tato stať pojednává o nestandardním použití vokalické délky v akuzativu singuláru zájmen ní, jí, naší a vaší. Pokouším se poukázat na možnou souvislost mezi výslovností těchto tvarů a jejich psanou podobou v Českém národním korpusu. Zejména se zaměřuji na to, v jakých kontextech volí uživatel jazyka (pisatel) v akuzativu singuláru dlouhý koncový vokál i v případech, kde to není v souladu s paradigmatem daného zájmena tak, jak je zachyceno v mluvnicích českého jazyka. Rozdíly z hlediska funkčních stylů zkoumám v korpusu SYN2015, rozdíly z hlediska regionální příslušnosti pak v korpusu SCHOLA2010. Sleduji rovněž, které předložky se pojí s nestandardními akuzativními tvary zájmen.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.