PL EN


2017 | 78 | 2 | 145-158
Article title

Comparison of spoken corpora from a sociolinguistic perspective

Authors
Content
Title variants
Languages of publication
EN
Abstracts
EN
This paper presents a comparison of the largest contemporary corpus of spoken Czech ORAL2013 and a different source, data gathered in the project “Sociolinguistic Analysis of the Use of Prothetic v- in Bohemia” (SAUP). Both of these data sources consist of informal interviews with Czech speakers, but their design is different. ORAL2013 is based on shorter recordings of many speakers whereas the SAUP data is based on longer recordings of fewer speakers. It is assumed that these two data sources should yield similar results since they aim to represent the same population. The comparison is based on the use of two features of spoken Czech in the Bohemia region: prothetic v- and conditional verb forms bych/bysem and bychom/bysme. Based on the analysis, it is concluded that (1) more information about the speakers should be added to future corpora like ORAL2013; (2) the corpus ORAL2013 is useful to conduct a sociolinguistic pilot study which then should be followed by a full-scale research project based on a different sample constructed strictly for the purposes of the particular research; (3) the ratio between the number of speakers in the corpus and the amount of their speech is an important (and often underestimated) aspect of corpus design which should be given careful consideration.
Contributors
author
  • Slovo a slovesnost, redakce, Ústav pro jazyk český AV ČR, v.v.i., Letenská 4, 118 51 Praha 1, Czech Republic
References
Document Type
Publication order reference
Identifiers
YADDA identifier
bwmeta1.element.cb7c69d6-d6ac-4803-b58a-0a340bf9902b
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.