Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 7

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  coreference
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
EN
This paper addresses Czech light verb constructions, partly revising principles of their syntactic structure formation formulated within the Functional Generative Description. It argues that obligatoriness of valency complementations should be reflected in these principles. Namely, the role of optional valency complementations of light verbs played in this process has been analyzed. This analysis has shown that in the cases where light verbs do not provide a sufficient number of valency complementations for the surface expression of semantic participants of predicative nouns, semantic participants of nouns make use of optional verbal complementations; namely ORIGin, LOCative and BENefactor have been attested in the VALLEX lexicon. In such cases, semantic participants can be expressed on the surface, either as optional verbal complementation or as nominal complementation. The distribution of verbal and nominal complementations have been observed in 1,600 light verb constructions extracted from the Czech National Corpus, with the result that the surface expression of these participants through the optional verbal complementations is strongly preferred (88% of verbal complementations and 12% of nominal ones). The semantic analysis has indicated that the optional verbal complementations are overrepresented as they cover broader semantic contexts than the corresponding nominal ones.
EN
A preliminary study in zero anaphora coreference resolution for PolishZero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities. Wstępne studium rozwiązywania problemu koreferencji anafory zerowej w języku polskimKoreferencja zerowa, w języku polskim, jest jednym z zagadnień rozpoznawania koreferencji. Dotychczas nie była ona bezpośrednim przedmiotem badań, gdyż ze względu na jej złożoność była pomijana i odsuwana na dalsze etapy badań. Artykuł prezentuje wstępne studium problemu, jakim jest rozpoznawanie koreferencji zerowej. Przedstawiamy podejście wykorzystujące techniki uczenia maszynowego oraz proces tworzenia cech w oparciu o analizę lingwistyczną korpusu KPWr. W przedstawionej pracy wykorzystujemy istniejące narzędzia do rozpoznawania koreferencji dla pozostałych rodzajów wzmianek (tj. nazwy własne, frazy rzeczownikowe oraz zaimki) jako źródło częściowych zbiorów wzmianek odnoszących się do tego samego obiektu, a także jako punkt odniesienia dla uzyskanych przez nas wyników. Ocena skupia się nie tylko na poprawności uzyskanych zbiorów wzmianek, bez względu na ich typ, co odzwierciedlają wyniki podane dla standardowych metryk CoNLL-2012, ale także na wartości informacji, która zostaje uzyskana w wyniku rozpoznania koreferencji. W nawiązaniu do założeń anotacji korpusu KPWr, jedynie nazwy własne traktowane są jako wzmianki, które zawierają w sobie wystarczająco szczegółową informację, aby można było powiązać je z obiektami rzeczywistymi. W konsekwencji dostarczamy także ocenę opartą na wartości informacji dla podmiotów domyślnych połączonych relacją koreferencji z nazwami własnymi. Z tą samą motywacją rozpatrujemy jedynie zbiory wzmianek koreferencyjnych zbudowane wokół nazw własnych.
EN
The contribution presents the prepared complex text annotations in the Prague Dependency Treebank (topic-focus articulation, coreference and bridging anaphora, discourse relations) and proposes solutions of the following theoretical and practical questions that arise from the interplay of the syntactic and discourse structure: advantages and disadvantages of text annotation on linear text compared to tectogrammatical trees (importance of the syntactic information for the interpretation of the discourse structure), the discrepancy between syntactic and discourse structure concerning the surface position of a discourse connective, and an unexpressed thought or assumption as a discourse argument.
EN
Language corpora annotation schemes cover various layers of sentence description nowadays – from morphology to semantics. Annotation projects concerning phenomena beyond the sentence boundaries, however, started to attract the attention of corpus linguists only recently. In the present contribution, we describe a unified approach to analysis of discourse phenomena, aimed and developed for a large-scale annotation of Czech empirical data of the Prague Dependency Treebank. This approach is based on two fundamental pillars: (i) it exploits the results of one of the first complex schemes for discourse annotation proposed and realized in the Penn Discourse Treebank for English; (ii) it follows the Praguian Functional Generative Description and treebanking tradition, taking advantage of the tectogrammatical (underlying) layer of sentence analysis and extending it to a full discourse-level description. Our analysis concentrates on two major aspects of discourse coherence: (i) on discourse relations (semantic relations between discourse segments) and discourse connectives as their lexical anchors; and (ii) on coreference and the so-called bridging anaphora. We present a detailed description of the annotation scheme and procedure, address individual problematic issues and offer basic corpus statistics and annotation evaluation.
EN
Although the Czech language has two forms of infinitive (active and passive) in its morphological paradigm, infinitive constructions can be found where this opposition is not employed and the active vs. passive interpretation depends on the context. This article focuses on active infinitives which convey meanings primarily expressed by passive infinitives. The verbs which govern such infinitives are divided into several classes: (a) potřebovat [to need], zasluhovat [to deserve], (b) verbs of movement, such as poslat [to send], přinést [to bring], odvézt [to take away], (c) žádat [to require], odmítat [to refuse]. The competition between the active forms of the infinitive and the analytic passive forms, coreference between the valency members of the governing verbs and the hidden valency member of the infinitive are analyzed. Quantitative insights into how the phenomenon under study is represented in the Czech National Corpus are provided. The article concludes with a terminological proposal to complete the system of functions in the domain of infinitive constructions.
6
51%
EN
The article deals with coreference relations in Czech and their classification based on the classification of reference types. First, we define the notion of coreference, putting stress on the distinction between coreference and endophora. Then the scheme of reference and coreference used in the annotation of the coreferential relations in the Prague Dependency Treebank (PDT) is presented, where two types of coreferential relations are applied: the type SPEC and the type GEN. Analyzing several examples, we argue that this scheme is not capable of describing more complicated reference relations, which results especially from the too broad definition of the type GEN. Hence we suggest an alternative classification of the types of reference, employing two independent criteria: the criterion of genericity and the criterion of specificity, so that four types of reference are established: specific individual, non-specific individual, specific generic and non-specific generic. We deal with each of these four types in detail, and finally, we show how this classification can be used when capturing coreference relations in text.
7
45%
EN
The present contribution is a theoretical and methodological study of the possibilities of processing discourse through the use of corpus methods. Despite the description complexity of phenomena “beyond the sentence boundary”, we argue that even more ways of systematic analysis are possible. Taking into account various attempts during the last decade to create discourse-annotated corpora, a reliable way to proceed in any such analysis is shown to be to distinguish between different layers of discourse analysis (in particular between “semantic” and “pragmatic” aspects) and to stick with the linguistic form as opposed to classifying phenomena with no surface realization.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.