Full-text resources of CEJSH and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl


2017 | 78 | 2 | 145-158

Article title

Comparison of spoken corpora from a sociolinguistic perspective



Title variants

Languages of publication



This paper presents a comparison of the largest contemporary corpus of spoken Czech ORAL2013 and a different source, data gathered in the project “Sociolinguistic Analysis of the Use of Prothetic v- in Bohemia” (SAUP). Both of these data sources consist of informal interviews with Czech speakers, but their design is different. ORAL2013 is based on shorter recordings of many speakers whereas the SAUP data is based on longer recordings of fewer speakers. It is assumed that these two data sources should yield similar results since they aim to represent the same population. The comparison is based on the use of two features of spoken Czech in the Bohemia region: prothetic v- and conditional verb forms bych/bysem and bychom/bysme. Based on the analysis, it is concluded that (1) more information about the speakers should be added to future corpora like ORAL2013; (2) the corpus ORAL2013 is useful to conduct a sociolinguistic pilot study which then should be followed by a full-scale research project based on a different sample constructed strictly for the purposes of the particular research; (3) the ratio between the number of speakers in the corpus and the amount of their speech is an important (and often underestimated) aspect of corpus design which should be given careful consideration.


  • Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic


Document Type

Publication order reference


YADDA identifier

JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.