Results found: 2

Search results

Sort By:

Limit search:

Korpus spontánní mluvené češtiny ORAL2013

100%

Benešová L., Křen M., Waclawičová M.

Časopis pro moderní filologii (Journal for Modern Philology)

2015

vol. 97

issue 1

42-50

The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed to represent contemporary spontaneous spoken language used in informal, real-life situations across the whole of the Czech Republic. The corpus consists of audio recordings and their transcriptions aligned with time stamps; it features manual annotation and broad regional coverage with a large variety of speakers. ORAL2013 contains 835 recordings from the period 2008 to 2011 made with 2,544 speakers (of whom 1,297 speakers are unique); the total length of the audio tracks is almost 300 hours and the total size of the transcriptions exceeds 3.28 million tokens. ORAL2013 is made publicly available by the Czech National Corpus at http://www.korpus.cz/.

Korpus DIA1900: jeho koncepce a vytváření

88%

Benešová L., Kučera K., Najbrtová K., Pivoňková K., Stluka M.

Časopis pro moderní filologii (Journal for Modern Philology)

2023

vol. 105

issue 1

121-140

The objective of the paper is to describe the principles for building the onemillionword DIA1900 Corpus consisting of Czech texts published between 1851 and 1900, designed to be both balanced and representative. There are two main goals determining the methods of corpus building and the decision to develop new tools tailored to the special needs of 19th century Czech: 1) to present the variability of Czech in the 2nd half of the 19th century (including spelling, morphology, wordformation) and 2) to link the detected variants to the appropriate lemmas. The paper presents the phases of the processing of the texts, including transcription, manual pre-annotation, as well as the construction of a large morphological dictionary and the selection of a suitable set of paradigms. Further sections are focused on annotation and morphological tagging and manual disambiguation. The objective was to create a gold standard, intended for use in the automatic annotation both of the DIA1900 corpus and the planned corpus of Czech texts of the years 1800–1850.

Refine search results

2 Časopis pro moderní filologii (Journal for Modern Philology)

2 Benešová L.

1 Kučera K.

1 Křen M.

1 Najbrtová K.

1 Pivoňková K.

1 Stluka M.

1 Waclawičová M.

1 2023

1 2015

Search results

Korpus spontánní mluvené češtiny ORAL2013

Korpus DIA1900: jeho koncepce a vytváření