Automaatse segmentimise hindamine

Meister, Einar; Meister, Lya

Article details

Journal

Mäetagused

2017 | 68 | 145-160

Article title

Automaatse segmentimise hindamine

Authors

Meister Einar , Meister Lya

Title variants

EN

EVALUATION OF AUTOMATIC SPEECH SEGMENTATION

Languages of publication

ET

Abstracts

EN

The use of large speech corpora in phonetic research depends to a great extent on the availability and quality of phonetic segmentation and transcriptions. As a rule, the best quality of segmentation is achieved by human transcribers who perform time-consuming and tedious manual work. However, tools for automatic segmentation exploiting typically HMM-based forced alignment methods have been developed for different languages. In recent years, two automatic systems as free online services have become available for Estonian: (1) the system developed at Tallinn University of Technology (https://phon.ioc.ee/dokuwiki/doku.php?id=projects:tuvastus:est-align.et), and (2) the multi-lingual tool WebMAUS (https://clarin.phonetik.uni-muenchen.de/BASWebServices/). In this study we evaluate the performance of the two systems against human transcribers. The test set includes Estonian read speech produced by: (1) four L1 adult subjects, (2) six L1 adolescents, and (3) four L2 adult subjects. The reference segmentation data including 27 sentences from L1 subjects and 10 sentences from the other subjects were produced manually as Praat textgrid files with two tiers (word-level orthographic and phoneme-level SAMPA transcription); the automatic systems have produced similar textgrid files. In total, 1179 word boundaries and 5050 phone boundaries were compared. The results show that both systems performed more accurately for L1 adult speech and were less accurate in the case of adolescent and L2 speech. While the TUT system outperformed WebMAUS in L1 adult speech, then in L1 adolescents and L2 speech WebMAUS produced more accurate results. Despite the deviations in phone boundaries, the durations of vowel and consonant segments measured from automatic and manual segmentations of L1 adult speech differ only marginally. This suggest that the accuracy of both automatic systems seems to be sufficient for speech technology needs and could also be used in acoustic studies of L1 adult speech. However, both systems need improvements in order to reach the accuracy of automatic segmentation tools available for English.

Keywords

EN

automatic segmentation Estonian phone boundaries segment durations speech corpora word boundaries

Publisher

Folk Belief and Media Group of the Estonian Literary Museum, Estonian Institute of Folklore, EKM Teaduskirjastus

Journal

Mäetagused

Year

2017

Volume

68

Pages

145-160

Physical description

Contributors

author

Meister Einar

einar.meister@ttu.ee

Tallinn University of Technology, Ehitajate tee 5, Tallinn 19086, ESTONIA

author

Meister Lya

lya.meister@ttu.ee

Tallinn University of Technology, Ehitajate tee 5, Tallinn 19086, ESTONIA

References

Document Type

Publication order reference

Identifiers

YADDA identifier

bwmeta1.element.cejsh-be34f90c-c134-49ab-ad28-efa92f2f41a0

Article details

Journal

Mäetagused

Article title

Automaatse segmentimise hindamine

Authors

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Volume

Pages

Physical description

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier