Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2017 | 37 | 17-33

Article title

Automatic Diachronic Normalization of Polish Texts

Content

Title variants

Languages of publication

Abstracts

EN
The paper presents a method for the automatic diachronic normalization of Polish texts – the procedure, which, for a given historical text, returns its contemporary spelling. The method applies finite-state transducers, defined in a sublanguage of the Thrax formalism. The paper discusses linguistic issues, such as evolution in spelling of the Polish language, as well as implementation aspects, such as efficiency or testing the proposed method.

Keywords

Year

Volume

37

Pages

17-33

Physical description

Dates

published
2017

Contributors

  • ADAM MICKIEWICZ UNIVERSITY, POZNAŃ
  • ADAM MICKIEWICZ UNIVERSITY, POZNAŃ
  • ADAM MICKIEWICZ UNIVERSITY, POZNAŃ
  • ADAM MICKIEWICZ UNIVERSITY, POZNAŃ

References

  • Allauzen C., Riley M., Schalkwyk J., Skut W. and Mohri M., OpenFst: A General and Efficient Weighted Finite-State Transducer LIbrary, Proceedings of the Twelfth International Conference on Implementation and Application of Automata, (CIAA 2007), Lecture Notes in Computer Science, Vol. 4783. pp. 11-23. Prague, Czech Republic. Springer.
  • Bronikowska R., Modrzejewski E. The enrichment of the lexical information and the corpus resources by using the results of the morphological analysis of historical texts, http://www.elexicography.eu/wp-content/uploads/2017/03/ (downloaded on 2017-06-15).
  • Graliński F., 2013, Polish digital libraries as a text corpus, in: Zygmunt Vetulani and Hans Uszkoreit (eds.), Proceedings of 6th Language & Technology Conference, pp. 509-513. Fundacja Uniwersytetu im. Adama Mickiewicza.
  • Klemensiewicz Z., 1963, (ed.), Pisownia polska. Przepisy – słowniczek, Warszawa – Kraków – Wrocław – Łódź, Zakład im. Ossolińskich.
  • Lisowski T., 2010, Economic calculation and Polish alphabetic writing, in: Sekiguchi T. (ed.), The International Academic Conference “Meetings of the Three Polish Studies Centres in Asia – China, Korea, Japan”, pp. 195–204. Malinowski M., 2012, Ortografia polska od II poł. XVIII wieku do współczesności. Kodyfikacja, reformy, recepcja; praca doktorska, Uniwersytet Śląśki w Katowicach.
  • Mykowiecka A., Rychlik P., Waszczuk J., Building an Electronic Dictionary of Old Polish on the Base of the Paper Resource, in: Petya Osenova, Stelios Piperidis, Milena Slavcheva Cristina Vertan (eds.), Proceedings of the Workshop on Adaptation of Language Resources and Tools for Processing Cultural Heritage at LREC 2012, European Language Resources Association (ELRA), 2012, pp. 16-21.
  • Tai T., Sproat R., Skut W., 2011, Thrax: An Open Source Grammar Compiler Built on OpenFst, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, IEEE, Piscataway, NJ.

Document Type

Publication order reference

Identifiers

Biblioteka Nauki
1036689

YADDA identifier

bwmeta1.element.ojs-doi-10_14746_il_2017_37_2_
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.