Transfer Learning Emotion Manifestation Across Music and Speech



Coutinho, Eduardo ORCID: 0000-0001-5234-1497, Deng, Jun and Schuller, Bjoern
(2014) Transfer Learning Emotion Manifestation Across Music and Speech. In: 2014 International Joint Conference on Neural Networks (IJCNN), 2014-7-6 - 2014-7-11.

[img] Text
TLMusicSpeech_final-NEW.pdf - Unspecified

Download (987kB)

Abstract

In this article, we focus on time-continuous predictions of emotion in music and speech, and the transfer of learning from one domain to the other. First, we compare the use of Recurrent Neural Networks (RNN) with standard hidden units (Simple Recurrent Network SRN) and Long-Short Term Memory (LSTM) blocks for intra-domain acoustic emotion recognition. We show that LSTM networks outperform SRN, and we explain, in average, 74%/59% (music) and 42%/29% (speech) of the variance in Arousal/Valence. Next, we evaluate whether cross-domain predictions of emotion are a viable option for acoustic emotion recognition, and we test the use of Transfer Learning (TL) for feature space adaptation. In average, our models are able to explain 70%/43% (music) and 28%/ll% (speech) of the variance in Arousal/Valence. Overall, results indicate a good cross-domain generalization performance, particularly for the model trained on speech and tested on music without pre-encoding of the input features. To our best knowledge, this is the first demonstration of cross-modal time-continuous predictions of emotion in the acoustic domain.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: Mind and Body, Clinical Research
Depositing User: Symplectic Admin
Date Deposited: 05 May 2016 15:48
Last Modified: 15 Mar 2024 00:48
DOI: 10.1109/ijcnn.2014.6889814
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3000602