Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

Results found: 4

first rewind previous Page / 1 next fast forward last

Search results

Search:
in the keywords:  information extraction
help Sort By:

help Limit search:
first rewind previous Page / 1 next fast forward last
1
100%
EN
Towards an event annotated corpus of PolishThe paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent) and text mentions (language dependant). The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr) was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.
EN
Towards Recognition of Spatial Relations between Entities for PolishIn this paper, the problem of spatial relation recognition in Polish is examined. We present the different ways of distributing spatial information throughout a sentence by reviewing the lexical and grammatical signals of various relations between objects. We focus on the spatial usage of prepositions and their meaning, determined by the ‘conceptual’ schemes they constitute. We also discuss the feasibility of a comprehensive recognition of spatial relations between objects expressed in different ways by reviewing the existing tools and resources for text processing in Polish. As a result, we propose a heuristic method for the recognition of spatial relations expressed in various phrase structures called spatial expressions. We propose a definition of spatial expressions by taking into account the limitations of the available tools for the Polish language. A set of rules is used to generate candidates of spatial expressions which are later tested against a set of semantic constraints.The results of our work on recognition of spatial expressions in Polish texts were partially presented in (Marcińczuk, Oleksy, & Wieczorek, 2016). In that paper we focused on a detailed analysis of errors obtained using a set of basic morphosyntactic patterns for generating spatial expression candidates - we identified and described the most common sources of errors, i.e. incorrectly recognized or unrecognized expressions. In this paper we focused mainly on the preliminary stages of spatial expression recognition. We presented an extensive review on how the spatial information can be encoded in the text, types of spatial triggers in Polish and a detailed evaluation of morphosyntactic patterns which can be used to generate spatial expression candidates. Rozpoznawanie relacji przestrzennych między obiektami fizycznymi w języku polskimArtykuł dotyczy zagadnienia rozpoznawania relacji przestrzennych w języku polskim. Autorzy przedstawili różne sposoby przekazywania w tekstach informacji na temat relacji przestrzennych między obiektami fizycznymi, uwzględniając sygnały o charakterze leksykalnym i gramatycznym. Istotną częścią artykułu jest omówienie znaczenia przyimków użytych w celu wyrażenia relacji przestrzennych. Znaczenie to kształtowane jest przez schematy konceptualne współtworzone przez poszczególne przyimki. Omówiono również możliwości kompleksowego rozpoznawania relacji przestrzennych wyrażonych za pomocą różnych środków językowych. Służy temu przegląd istniejących zasobów i narzędzi przetwarzania języka polskiego.Jako rezultat autorzy proponują heurystyczną metodę rozpoznawania relacji przestrzennych realizowanych językowo za pomocą struktur składniowych określonych jako wyrażenia przestrzenne. W artykule zaprezentowano definicję wyrażeń przestrzennych uwzględniającą specyfikę narzędzi dostępnych do przetwarzania języka polskiego. Zestaw reguł składniowych umożliwia wytypowanie fraz – kandydatów kwalifikujących się jako wyrażenia przestrzenne, które następnie zostają porównane z adekwatnym zestawem ograniczeń semantycznych.
EN
Temporal Expressions in Polish Corpus KPWrThis article presents the result of the recent research in the interpretation of Polish expressions that refer to time. These expressions are the source of information when something happens, how often something occurs or how long something lasts. Temporal information, which can be extracted from text automatically, plays significant role in many information extraction systems, such as question answering, discourse analysis, event recognition and many more. We prepared PLIMEX - a broad description of Polish temporal expressions with annotation guidelines, based on the state-of-the-art solutions for English, mainly TimeML specification. We also adapted the solution to capture the local semantics of temporal expressions, called LTIMEX. Temporal description also supports further event identification and extends event description model, focusing at anchoring events in time, ordering events and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines.
EN
Objective: The objective of the paper is to analyse publicly available government policy documents of the United Arab Emirates (UAE) and the Kingdom of Saudi Arabia (KSA) in order to identify key topics and themes for these two countries in relation to the COVID-19 response. Research Design & Methods: In view of the availability of large volumes of documents as well as advancement in computing system, text mining has emerged as a significant tool to analyse large volumes of unstructured data. For this paper, we have applied latent semantic analysis and Singular Value Decomposition (SVD) for text clustering. Findings: The results of the analysis of terms indicate similarities of key themes around health and pandemic for the UAE and the KSA. However, the results of text clustering indicate that focus of the UAE’ documents in on ‘Digital’-related terms, whereas for the KSA, it is around ‘International Travel’-related terms. Further analysis of topic modelling demonstrates that topics such as ‘Vaccine Trial’, ‘Economic Recovery’, ‘Health Ministry’, and ‘Digital Platforms’ are common across both the UAE and the KSA. Contribution / Value Added: The study contributes to text-mining literature by providing a framework for analyzing public policy documents at the country level. This can help to understand the key themes in policies of the governments and can potentially aid the identification of the success and failure of various policy measures in certain cases by means of comparing the outcomes. Implications / Recommendations: The results of this study clearly showed that text clustering of unstructured data such as policy documents could be very useful for understanding the themes and orientation topics of the policies.
first rewind previous Page / 1 next fast forward last
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.