Natural Word-level Emphasis Corpus based on the CMU ARCTIC Database

Natural Word-level Emphasis Corpus

This corpus contains emphasis annotations for the the 597 utterances of the set A of the CMU ARCTIC dataset, based on the intonation of the awb speaker (male, Scottish accent). The judge labelled the word(s) that were perceived as the focus of the utterance based on the natural emphasis of the speaker. The ARCTIC speech database contains sentences that were read out loud from out-of-copyright texts, thus it does not contain strong emphatic variation. The judge labelled naturally emphasised words (e.g., content words) as well as involuntary fluctuations of the speaker. The corpus contains an average of 2.32 emphasised words per utterance (26.3% of the words).

Download the Natural Word-level Emphasis Corpus (30 Kb).

The sound files on which the annotations are based can be found in the CMU ARCTIC database (speaker awb).

[ Back to homepage ]

Francois Mairesse, 2009