Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 7

first rewind previous Page / 1 next fast forward last

Search results

help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
The article presents the structure of the Corpus of Historical Slovak – a diachronic corpus of written Slovak texts predating language standardization attempts (texts from the 15th to the 18th century). The content of the corpus is based predominantly on existing published transcribed manuscripts, in this sense it is an opportunistic corpus, aiming to collect primarily existing texts; but we also collect and transcribe some documents directly, in order to improve the chronological balance of the corpus. The corpus aims for historical accuracy captured orthography-wise, but given existing standards in transcribing historical Slovak, this was not always possible with complete accuracy.
EN
Lemmatization, morphological (or morphosyntactic) annotation (MSD) and disambiguation is a basic and indispensable step in Natural Language Processing of languages with a moderate level of inflection. We present a web interface demonstrating the de facto default lemmatization and MSD for Slovak, as used in major Slovak corpora (with several enhancements yet to be applied in the corpora). The interface can be used chiefly for presentation or pedagogical purposes, with the morphological tags expanded and explained using plain language in several languages, including two different terminological registers of Slovak (professional linguistic or a “common” one).
EN
The article describes a method to analyse contemporary Slovak vocabulary with regard to the origin of the words. By using statistical data from a representative corpus of modern written language and etymological information we arrive at reasonably confident estimation of the ratio of loanwords in common Slovak vocabulary and the provenance of lexical borrowings. We demonstrate some of the findings in tables and charts, providing information that is interesting to non-linguistically oriented members of Slovak population (who are sometimes vocal in expressing their attitudes to the perceived amount of loanwords in the Slovak language), but can be also inspiring for further research in philology or linguistics.
EN
This article is the second part of the study describing problems encountered in the usage of non-Slovak anthroponyms in the contemporary Slovak language. In the inter-lingual context we evaluate current tendencies of these onymic units in texts, especially the degree and forms of the transformation of non-Slovak female names according to the traditional anthropomodel of Slovak female surnames. The fundamental question is the adaptation of foreign female anthroponyms, especially the feminization of surnames – an explicit indication of female gender of the person via the – ová suffix, added to foreign surnames, but also other related modifications (e.g. inverse word order of Asian names) when appearing in Slovak texts. The analysis has been carried out using the Aranea family of web corpora. This part describes the adaptation and feminization of Hispanic, Hungarian and Polish female anthroponyms in the Slovak language. We detected trends towards simplification of Hispanic multi-surname anthroponyms into single-surname forms and tendencies of domestication and regularization of feminine forms of Polish surnames conforming to adjective paradigms.
EN
Lemmatization and morphological tagging is an indispensable step in Slovak corpus linguistics. In this article, we evaluate two state-of-the-art Slovak language lemmatizers and MSD taggers. One is based on MorphoDiTa and the other is based on spaCy. We measured accuracy on the test subset of manually lemmatized and MSD annotated corpus and found that the combination of lemma and tag achieved 93.5% accuracy with MorphoDiTa, and 95.6% accuracy with spaCy. Most of the errors occurred in disambiguating MSD tags for homonymous uninflected parts of speech such as particles, conjunctions, and adverbs, and in disambiguating singular masculine inanimate nominative and accusative. In these cases, spaCy shows a noticeable improvement over MorphoDiTa, likely due to a better exploitation of the context of the words.
EN
In this paper we describe our automatic analysis of several parallel Bulgarian-Slovak texts with the goal to obtain useful information about Slovak translation equivalents of (definite) articles and demonstrative pronouns in Bulgarian. Rather than focusing on individual translation equivalents, we present a method for automatic extraction and visualization of the translations. This can serve as a guide for pinpointing interesting features in specific translated documents and could be extended for other parts of speech or otherwise identifiable textual units.
EN
We analyse reflexivity of Slovak verbs based on Slovak National Corpus. We derive a simple parameter describing the amount of reflexivity for a given verb, based on the distribution of the reflexive pronouns in the left and right context of the verb in the corpus, and apply the method to sort the verbs in the Slovak National Corpus according to the parameter. The method allows us to classify verbs automatically according to their reflexivity, given enough amounts of their occurrences in the corpus.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.