Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2010 | 21 | 1-16

Article title

Creating and Weighting Hunspell Dictionariesas Finite-State Automata

Content

Title variants

Languages of publication

PL

Abstracts

PL
Therearenumerousformatsforwritingspell-checkersforopen-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spellcheckingsuggestionmechanismusingweightedfinite-statetechnology.Whatwe propose is a generic and efficient language-independent framework of weighted finite-stateautomataforspell checkingintypicalopen-sourcesoftware,e.g.Mozilla Firefox, OpenOffice and the Gnome desktop.

Keywords

Year

Volume

21

Pages

1-16

Physical description

Dates

published
2010-06-15

Contributors

author
  • Department of Modern Languages, University of Helsinki, Finland
  • Department of Modern Languages, University of Helsinki, Finland

References

  • Beesley, K.R.: Constraining separated morphotactic dependencies in finite-state grammars. pp. 118–127. Association for Computational Linguistics, Morristown, NJ, USA (1998)
  • ---
  • Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI publications (2003).
  • ---
  • Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL ’00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. pp. 286–293. Association for Computational Linguistics, Morristown, NJ, USA (2000).
  • Garrido-Alenda, A., Forcada, M.L., Carrasco, R.C.: Incremental construction and maintenance of morphological analysers based on augmented letter transducers (2002)
  • Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, University of Helsinki (1983), http://www. ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf.
  • Lindén, K., Silfverberg, M., Pirinen, T.: Hfst tools for morphology-an efficient open-source package for construction of morphological analyzers. In: Mahlow, C., Piotrowski, M. (eds.) sfcm 2009. Lecture Notes in Computer Science, vol. 41, pp. 28-47. Springer (2009).
  • Mohri, M., Riley, M.: An efficient algorithm for the n-best-strings problem (2002).
  • Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL workshop on creation and use of basic lexical resources for less-resourced languagages. pp. 13–18. Valletta, Malta (2010), http: //siuc01.si.ehu.es/~jipsagak/SALTMIL2010_Proceedings.pdf
  • Wilcox-O’Hearn, L.A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In: Gelbukh, A.F. (ed.) CICLing. Lecture Notes in Computer Science, vol. 4919, pp. 605–616. Springer (2008).

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.ojs-doi-10_14746_il_2010_21_1
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.