Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl

PL EN


2014 | 14 | 13-20

Article title

Trilingual aligned corpus – current state and new applications

Content

Title variants

Languages of publication

EN

Abstracts

EN
This article describes current state of a trilingual parallel corpus consisted of texts in two Slavic (Bulgarian and Polish) and one Baltic language (Lithuanian). The corpus contains original literary texts (fiction, novels, and short stories) in one of the three languages with translations to the other two, and texts in other languages translated into Bulgarian, Polish, and Lithuanian. A part of the texts are aligned at the sentence level. The authors propose a semantic annotation of verbs appearing in these aligned texts that will facilitate contrastive studies of natural languages. A theoretical background for the proposed semantic annotation is briefly also discussed.

Year

Issue

14

Pages

13-20

Physical description

Dates

published
2014-09-04

Contributors

  • Институт по математикa и информатика, Българска академия на науките [Institute of Mathematics and Informatics, Bulgarian Academy of Sciences], София [Sofia], Bulgaria
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland
author
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland
author
  • Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw], Poland

References

  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2009a). Bulgarian-Polish-Lithuanian Corpus - Current Development. In C. Vertan, S. Piperidis, E. Paskaleva, & M. Slavcheva (Eds.), Multilingual resources, technologies and evaluation for Central and Eastern European languages. Proc. of the International Workshop in conjunction with International Conference RANPL - 2009. Borovec, Bulgaria, 17 September 2009 (pp. 1-8). Bulgaria, Shoumen: INCOMA Ltd.
  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2009b). Bulgarian-Polish-Lithuanian Corpus - Problems of Development and Annotation. In T. Erjavec (Ed.), Research Infrastructure for Digital Lexicography. Proc. of the MONDILEX Fifth Open Workshop within International Conference Information Society’2009, 14-15 October, 2009, Ljubljana (pp. 72-86). Ljubljana: Informacijska drużba.
  • Dimitrova, L., Koseska, V., Roszko, D., & Roszko, R. (2010). Application of Multilingual Corpus in Contrastive Studies (on the example of the Bulgarian-Polish-Lithuanian Parallel Corpus). Cognitive Studies | Études cognitives, 10, 217-240.
  • Dimitrova, L., Koseska-Toszewa, V., Roszko, D., & Roszko, R. (2011). Bulgarian-Polish-Lithuanian Corpus - Recent Progress and Application. In D. Majchráková, & R. Garabík (Eds.), NLP, Multilinguality. Proc. of the 6th International Conference SLOVKO’2011, Modra, Slovakia, 20-21 October 2011 (pp. 44-50). Brno: Tribun EU.
  • EMEA. (n.d.) Retrieved from http://opus.lingfil.uu.se/EMEA.php
  • Koseska-Toszewa, V. (2006). Semantyczna kategoria czasu, Gramatyka konfrontatywna bułgarsko-polska (Vol. 7). Warszawa.
  • Koseska-Toszewa, V., & A. Mazurkiewicz. (1988). Net Representation of Sentences in Natural Languages. In Lecture Notes in Computer Science 340, Advances in Petri Nets (pp. 249-266). Berlin: Springer-Verlag.
  • Koseska V., & Mazurkiewicz A. (2010). Time flow and tenses. Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Mazurkiewicz, A. (1986). Zdarzenia i stany: elementy temporalności. In Studia gramatyczne bułgarsko-polskie (Vol. I, Temporalność, pp. 7-21). Wrocław.
  • MultiUN (n.d.). Retrieved from http://opus.lingfil.uu.se/MultiUN.php
  • OPUS corpus (n.d.). Retrieved from http://opus.lingfil.uu.se/
  • ParaSol corpus (n.d.). Retrieved from http://parasol.unibe.ch/
  • Roszko, D. (2006). Funkcjonalne odpowiedniki litewskiego perfectum w litewskiej gwarze puńskiej i w języku polskim, Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Roszko, R. (1993). Wykładniki modalności imperceptywnej w języku polskim i litewskim. Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • Roszko, R. (2004). Semantyczna kategoria określoności/nieokreśloności w języku litewskim (w zestawieniu z językiem polskim). Warszawa: Slawistyczny Ośrodek Wydawniczy.
  • TextAlign (n.d.). Retrieved from http://mt2007-cat.ru/index.html
  • Tiedemann, J. (2009). News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, R. Mitkov (Eds.) Recent Advances in Natural Language Processing (Vol. V: Proceedings, pp. 237-248). Amsterdam/Philadelphia: John Benjamins.

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.desklight-7ce906ad-02ac-4954-871d-a0019ec41539
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.