Aktualizace rozvržení zdrojů Českého národního korpusu s ohledem na revizi vyváženosti jeho struktury.
ACTUALIZING THE DISTRIBUTION OF CZECH NATIONAL CORPUS SOURCES THROUGH RE-EVALUATION OF THE BALANCE IN CORPUS STRUCTURE
Languages of publication
In order to develop balanced corpora, the term 'expectations' of the future potential user of corpora has been introduced (Kralik, 2001). Based on several statistical studies of such expectations, the textual structure of SYN2000, which is the synchronic part of the Czech National Corpus (CNC) has been proposed and realized. The present article discusses two new studies of expectations (Akter 2001 and CJ 2001) and suggests important implications for future work on CNC. Table 1 and Table 2 reveal the stability of expectations in the categories of fiction (krasna literatura) and newspapers and magazines (noviny + casopisy). Although the daily contact between respondents and administrative texts is stable (see Table 3), the distribution of these texts is closely bound to other non-fiction topics, which is why no special attention to administrative texts is proposed. The expectations concerning newspapers and magazines are stable (Table 5), but changed radically during 1996-2001 (first and last searches, Table 6). Within the same period, an obvious rise in interest in fiction has been noted (Table 6). The reasons for this can be attributed to natural societal development. Thus, a strong reduction in newspaper texts and strong increase in the use of fictional texts is proposed (Table 7 + Table 8).
Publication order reference
CEJSH db identifier