Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2023 | 88 | 2 | 129 – 140

Article title

ANALYSING ACCURACY OF SLOVAK LANGUAGE LEMMATIZATION AND MSD TAGGING

Content

Title variants

Languages of publication

EN

Abstracts

EN
Lemmatization and morphological tagging is an indispensable step in Slovak corpus linguistics. In this article, we evaluate two state-of-the-art Slovak language lemmatizers and MSD taggers. One is based on MorphoDiTa and the other is based on spaCy. We measured accuracy on the test subset of manually lemmatized and MSD annotated corpus and found that the combination of lemma and tag achieved 93.5% accuracy with MorphoDiTa, and 95.6% accuracy with spaCy. Most of the errors occurred in disambiguating MSD tags for homonymous uninflected parts of speech such as particles, conjunctions, and adverbs, and in disambiguating singular masculine inanimate nominative and accusative. In these cases, spaCy shows a noticeable improvement over MorphoDiTa, likely due to a better exploitation of the context of the words.

Keywords

Year

Volume

88

Issue

2

Pages

129 – 140

Physical description

Contributors

  • Jazykovedný ústav Ľ. Štúra SAV, v. v. i., Panská 26, Bratislava, Slovak Republic
author

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.cejsh-67be7c94-9799-4906-9d74-75108ed12646
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.