Subjective quality ratings and physiological correlates of synthesized speech

Published in 5th Int. Workshop on Quality of Multimedia Experience (QoMEX 2013), 2013

Arndt, S., Antons, J.-N., Gupta, R., Laghari, K., Schleicher, R., Möller, S. & Falk, T. H.

Download publication here.

Evaluating the quality of text-to-speech systems (TTS) is usually achieved by subjective methods where participants have to rate the stimulus on multiple scales, such as naturalness, prosody, and overall quality. In the present study, we aim towards evaluating TTS system quality using not only conventional subjective methods, but also via a neurophysiological approach based on obtaining neural correlates of TTS quality perception using electroencephalography (EEG). Such an approach allows for better insight into the perception processes involved during the human quality judgement process, and may open doors to innovative subjective testing methods and/or objective measurement tools. In our experiments, we have shown an inverse relationship between TTS speech quality and the amplitude of an EEG evoked response called the ‘P300,’ suggesting an increase in cognitive load as TTS quality decreases, likely due to reduction in speech intelligibility.

Recommended citation: Arndt, S., Antons, J.-N., Gupta, R., Laghari, K., Schleicher, R., Möller, S. & Falk, T. H. (2013, July). Subjective quality ratings and physiological correlates of synthesized speech. Poster presented at the 5th Int. Workshop on Quality of Multimedia Experience (QoMEX 2013), Klagenfurt, Austria. https://doi.org/10.1109/QoMEX.2013.6603229