Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl


2014 | 14 | 13-20

Article title

Trilingual aligned corpus – current state and new applications


Title variants

Languages of publication



This article describes current state of a trilingual parallel corpus consisted of texts in two Slavic (Bulgarian and Polish) and one Baltic language (Lithuanian). The corpus contains original literary texts (fiction, novels, and short stories) in one of the three languages with translations to the other two, and texts in other languages translated into Bulgarian, Polish, and Lithuanian. A part of the texts are aligned at the sentence level. The authors propose a semantic annotation of verbs appearing in these aligned texts that will facilitate contrastive studies of natural languages. A theoretical background for the proposed semantic annotation is briefly also discussed.






Physical description




  • Институт по математикa и информатика, Българска академия на науките [Institute of Mathematics and Informatics, Bulgarian Academy of Sciences], София [Sofia], Bulgaria
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland


  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2009a). Bulgarian-Polish-Lithuanian Corpus - Current Development. In C. Vertan, S. Piperidis, E. Paskaleva, & M. Slavcheva (Eds.), Multilingual resources, technologies and evaluation for Central and Eastern European languages. Proc. of the International Workshop in conjunction with International Conference RANPL - 2009. Borovec, Bulgaria, 17 September 2009 (pp. 1-8). Bulgaria, Shoumen: INCOMA Ltd.
  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2009b). Bulgarian-Polish-Lithuanian Corpus - Problems of Development and Annotation. In T. Erjavec (Ed.), Research Infrastructure for Digital Lexicography. Proc. of the MONDILEX Fifth Open Workshop within International Conference Information Society’2009, 14-15 October, 2009, Ljubljana (pp. 72-86). Ljubljana: Informacijska drużba.
  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2010). Application of Multilingual Corpus in Contrastive Studies (on the example of the Bulgarian-Polish-Lithuanian Parallel Corpus). Cognitive Studies | Études cognitives, 10, 217-240.
  • Dimitrova, L., Koseska-Toszewa, V., Roszko, D., & Roszko, R. (2011). Bulgarian-Polish-Lithuanian Corpus - Recent Progress and Application. In D. Majchráková, & R. Garabík (Eds.), NLP, Multilinguality. Proc. of the 6th International Conference SLOVKO’2011, Modra, Slovakia, 20-21 October 2011 (pp. 44-50). Brno: Tribun EU.
  • EMEA. (n.d.) Retrieved from http://opus.lingfil.uu.se/EMEA.php
  • Koseska-Toszewa, V. (2006). Semantyczna kategoria czasu, Gramatyka konfrontatywna bułgarsko-polska (Vol. 7). Warszawa.
  • Koseska-Toszewa, V., & A. Mazurkiewicz. (1988). Net Representation of Sentences in Natural Languages. In Lecture Notes in Computer Science 340, Advances in Petri Nets (pp. 249-266). Berlin: Springer-Verlag.
  • Koseska V., & Mazurkiewicz A. (2010). Time flow and tenses. Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Mazurkiewicz, A. (1986). Zdarzenia i stany: elementy temporalności. In Studia gramatyczne bułgarsko-polskie (Vol. I, Temporalność, pp. 7-21). Wrocław.
  • MultiUN (n.d.). Retrieved from http://opus.lingfil.uu.se/MultiUN.php
  • OPUS corpus (n.d.). Retrieved from http://opus.lingfil.uu.se/
  • ParaSol corpus (n.d.). Retrieved from http://parasol.unibe.ch/
  • Roszko, D. (2006). Funkcjonalne odpowiedniki litewskiego perfectum w litewskiej gwarze puńskiej i w języku polskim, Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Roszko, R. (1993). Wykładniki modalności imperceptywnej w języku polskim i litewskim. Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Roszko, R. (2004). Semantyczna kategoria określoności/nieokreśloności w języku litewskim (w zestawieniu z językiem polskim). Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • TextAlign (n.d.). Retrieved from http://mt2007-cat.ru/index.html
  • Tiedemann, J. (2009). News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, R. Mitkov (Eds.) Recent Advances in Natural Language Processing (Vol. V: Proceedings, pp. 237-248). Amsterdam/Philadelphia: John Benjamins.

Document Type

Publication order reference


YADDA identifier

JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.