Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2011 | 11 |

Article title

Developing free morphological data for Polish

Content

Title variants

Languages of publication

EN

Abstracts

EN
Developing free morphological data for PolishA limiting factor in construction of Natural Language Processing (NLP) systems is often the availability of morphological resources. This indeed happens for Polish: the freely available corpus with manual morpho-syntactic annotation (part of the IPI PAN Corpus) is not coupled with any free morphological analyser. There exists a very large morphological dictionary of Polish available under a free licence – Morfologik. Unfortunately, its tagset differs significantly from the tagset of the corpus and, what is more, its morphological description lacks desired rigour. We amend this situation by performing a massive conversion of the dictionary into the tagset compliant with the corpus. The conversion results in a free dictionary containing entries for almost 3.5 million different word forms. In this article we report on our methodology, discuss some morphological and syntactic issues related to both tagsets and present the characteristics of the resulting dictionary.

Year

Issue

11

Physical description

Dates

published
2011
online
2015-11-24

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.ojs-doi-10_11649_cs_2011_012
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.