Characterization of human emotions and preferences for text-to-speech systems using multimodal neuroimaging methods

Published in IEEE Canadian Conference on Electrical and Computer Engineering (CCECE 2014), 2014

Rehman Laghari, K., Gupta, R., Arndt, S., Antons, J.-N., Möller, S. & Falk, T. H.

Download publication here.

Voice user interface and speech quality are normally assessed using subjective user experience testing methods and/or objective instrumental techniques. However, the recent advances in neurophysiological tools allowed useful human behavioral constructs to be measured in real-time, such as human emotion, perception, preferences and task performance. Electroencephalography (EEG), and functional near-infrared spectroscopy (fNIRS) are well received neuroimaging tools and they are being used in variety of different domains such as health science, neuromarketing, user experience (UX) research and multimedia quality of experience (QoE) discipline. Therefore, this paper describes the impact of natural and text-to-speech (TTS) signals on a user's affective state (valence and arousal) and their preferences using neuroimaging tools (EEG and fNIRS) and subjective user study. The EEG results showed that the natural and high quality TTS speech generate “positive valence”, that was inferred from a higher EEG asymmetric activation at frontal head region. fNIRS results showed the increased activation at Orbito-Frontal Cortex (OFC) region during decision making in favor of natural and high quality TTS speech signals. But natural and TTS signals have significantly different arousal levels.

Recommended citation: Rehman Laghari, K., Gupta, R., Arndt, S., Antons, J.-N., Möller, S. & Falk, T. H. (2014, May). Characterization of Human Emotions and Preferences for Text-to-Speech Systems Using Multimodal Neuroimaging Methods. Paper presented at the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE 2014), Toronto, ON, Canada. https://doi.org/10.1109/CCECE.2014.6901142