Results found: 3

Search results

Sort By:

Limit search:

Víceslovné lexémy v syntaktickém kontextu

100%

Rosen A., Skoumalová H., Znamenáček J.

Studie z aplikované lingvistiky - Studies in Applied Linguistics

2020

vol. 11

issue 2

63-84

We start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches between the language use and the language system we use a constraint-based grammar run as a constraint solver on texts tagged and dependency-parsed by stochastic tools. The texts also have MWEs (multi-word expressions) identified and transformed into a constituency-based format before the grammar is applied. We describe the role and results of the grammar, and its use to check texts annotated with morphosyntactic categories, syntactic structure and information about the status of relevant expressions as MWEs. The grammar also employs lexical resources such as a valency lexicon and a database of MWEs to make the checking more accurate and the annotation more informative. The results are represented as typed feature structures where MWE-related information can be shared by lexical and phrasal nodes. This allows for the annotation of MWEs as lexical units, independently of their analysis in terms of syntactic structure. Focusing on the interplay of MWEs with their syntactic context we analyse a number of representative examples, pointing out the pros and cons of specific solutions and the whole approach.

Možnosti využití anotace syntaktické komplexity v paralelním korpusu: příklad francouzských tvarů na -ant v konverbální funkci a jejich českých protějšků P

100%

Nádvorníková O., Rosen A., Karolína P.

Časopis pro moderní filologii (Journal for Modern Philology)

2025

vol. 107

issue 1

80-101

This study explores new research opportunities offered by the InterCorp v16ud parallel corpus, annotated using the Universal Dependencies scheme and enriched with syntactic complexity (SC) measures. The analysis focuses on French sentences containing -ant forms (gerund and present participle) and their Czech translations, with participles restricted to adverbial (converbal) usage for comparability. The results show significant SC variation in literary texts, with Czech translations displaying lower values than French originals. Coefficient of variation and correlation analyses suggest that participles may function as stylistic markers, unlike gerunds. At the sentence level, participles are associated with higher SC than gerunds, though the differences are moderate. The contrastive analysis reveals substantial reductions in clausal SC measures in the Czech translations, probably due to the replacement of subordination by coordination. These shifts affect SC information hierarchy, and occasionally temporal relations. The study underscores the potential of InterCorp v16ud for syntactic research in contrastive linguistics and beyond, while emphasizing the multidimensional nature of SC.

Typologie víceslovných jednotek v češtině a frekvenční zastoupení jejich hlavních vlastností v žánrově vyváženém korpusu

64%

Petkevič V., Kopřivová M., Hnátková M., Jelínek T., Kopřiva P., Rosen A., Skoumalová H., Vondřička P.

Studie z aplikované lingvistiky - Studies in Applied Linguistics

2020

vol. 11

issue 2

37-62

The paper consists of two main parts: (a) In the first part, a typology of multiword expressions (MWE) in Czech is described in a detailed way. This typology is part of the description of MWE database entries in the lexical database LEMUR containing more than 10,500 MWE entries as of June 2020. MWE properties reflected in this typology are accounted for by categories and their values. Each MWE is identified by a unique lemma; a group of related MWEs is assigned a “superlemma”. A MWE is described by the following properties: a MWE definition, characteristic examples, lemmas and morphological features of MWE components (words), as well as the following key categories: MWE style/register, type of usage, syntactic structure (including its representation by a dependency and a phrase-structure tree), aspects of flexibility (variants and fragments, internal modifiability of individual MWE components, possibilities of syntactic transformations of the main MWE components and morphological constraints) and types of idiomaticity on the lexical, morphological, syntactic, semantic and pragmatic level. (b) In the second part of the paper, the authors focus on the frequency of the main features of the adopted typology in the real language material represented by the genre-balanced SYN2015 corpus, containing 100 mil. word forms (excluding punctuation): a type of usage correlated with a syntactic type and frequency of various kinds of idiomaticity. Our paper seems to be the first attempt at approaching the MWE properties from the point of view of MWE frequencies as types rather than tokens (i.e. frequencies of occurrences of a given MWE).

Příspěvek má dvě hlavní části: (a) V první části je podrobně popsána typologie (vlastnosti) víceslovných lexikálních jednotek (dále VLJ) v češtině, přičemž tato typologie je součástí popisu databázových hesel těchto jednotek v lexikální databázi LEMUR, obsahující k červnu 2020 více než 10 500 hesel.2 Jednotlivé vlastnosti těchto 1 Příspěvek vznikl jako součást projektu Mezi slovníkem a gramatikou (Between Lexicon and Grammar), podpořeného Grantovou agenturou České republiky, reg. č. 16-07473S. 2 Databáze LEMUR je podrobně charakterizována v článku Vondřička (2019). Vznikla v Ústavu Českého národního korpusu FF UK a výhledově bude zpřístupněna uživatelům. Bude rovněž postupně propojována s korpusem, kde budou víceslovné lexikální jednotky anotovány, takže bude možné podle anotovaných vlastností vyhledávat. Na vyžádání v Ústavu Českého národního korpusu FF UK je ovšem možné zpřístupnit databázi k nahlédnutí již nyní. OPEN ACCESS 38 STUDIE Z APLIKOVANÉ LINGVISTIKY 2/2020 jednotek jsou zachyceny prostřednictvím kategorií a jejich hodnot. U každé jednotky uvádíme její identifikační lemma a tzv. superlemma, definici, typické příklady; dále popisujeme lemmata a morfologické vlastnosti jednotlivých komponent (slov) a poté takové charakteristiky jako styl/varieta VLJ, její typ užití, syntaktická struktura (včetně reprezentace v podobě závislostního a frázového stromu), aspekty ustálenosti/flexibility (včetně variant a fragmentů VLJ, vnitřní modifikovatelnosti jednotlivých komponent VLJ, možností syntaktických transformací hlavních komponent VLJ a též morfologických omezení) a konečně typy idiomatičnosti na rovině lexikální, morfologické, syntaktické, sémantické a pragmatické. (b) V druhé, hlavní části příspěvku sledujeme frekvenční zastoupení hlavních aspektů této typologie u dosud zpracovaných VLJ: typ užití v korelaci se syntaktickým typem a dále zastoupení různých druhů idiomatičnosti, a to v reálném jazykovém materiálu reprezentovaném žánrově vyváženým korpusem SYN2015 (obsahuje sto milionů slovních tvarů mimo interpunkci). Jde patrně vůbec o první pokus zaměřit se na vlastnosti víceslovných lexikálních jednotek z hlediska četnosti jejich výskytů jakožto typů, nikoli tokenů (tj. četností výskytů dané jednotky).

Refine search results

2 Studie z aplikované lingvistiky - Studies in Applied Linguistics

1 Časopis pro moderní filologii (Journal for Modern Philology)

3 Rosen A.

2 Skoumalová H.

1 Hnátková M.

1 Jelínek T.

1 Karolína P.

1 Kopřiva P.

1 Kopřivová M.

1 Nádvorníková O.

1 Petkevič V.

1 Vondřička P.

1 Znamenáček J.

1 2025

2 2020

Search results

Víceslovné lexémy v syntaktickém kontextu

Možnosti využití anotace syntaktické komplexity v paralelním korpusu: příklad francouzských tvarů na -ant v konverbální funkci a jejich českých protějšků P

Typologie víceslovných jednotek v češtině a frekvenční zastoupení jejich hlavních vlastností v žánrově vyváženém korpusu