Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2006 | 62 | 31-44

Article title

COMPUTATIONAL TOOLS FOR MANAGING LARGE TEXT CORPORA: THE SEARCH ENGINE 'HOLMES'

Selected contents from this journal

Title variants

Languages of publication

PL

Abstracts

EN
Large text corpora management requires sophisticated computational tools. For highly inflecting languages like Polish homonymy is a challenge computer men have to face; in Polish texts, every 42nd word per 100 is grammatically ambiguous. A search engine 'Holmes', designed by Michal Rudolf, works as a disambiguator, rather than a tagger. It operates on texts which are morphologically marked before by special programs. After the user keyboards her query 'Holmes' examines sets of tags for each word, rejecting as many improper interpretations as possible. 'Holmes' makes use of linguistic, not statistical methods of disambiguation. It is based upon a number of rules formalizing various contextual restrictions on words. Query results are obtainable online.

Contributors

author
author
  • M. Swidzinski, Uniwersytet Warszawski, Wydzial Polonistyki, Instytut Jezyka Polskiego, ul. Krakowskie Przedmiescie 26/28, 00-325 Warszawa, Poland

References

Document Type

Publication order reference

Identifiers

CEJSH db identifier
07PLAAAA02645412

YADDA identifier

bwmeta1.element.28593081-205f-37c1-99f2-7f6808c49a0c
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.