Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

Marc Dominic Enriquez, Crisron Rudolf Lucas, Angelina Aquino

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

Abstract

Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.

Original languageEnglish
Title of host publication2023 34th Irish Signals and Systems Conference, ISSC 2023
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1-6
Number of pages6
ISBN (Electronic)9798350340570
DOIs
Publication statusPublished - 2023
Event34th Irish Signals and Systems Conference, ISSC 2023 - Dublin, Ireland
Duration: 13 Jun 202314 Jun 2023

Publication series

Name2023 34th Irish Signals and Systems Conference, ISSC 2023

Conference

Conference34th Irish Signals and Systems Conference, ISSC 2023
Country/TerritoryIreland
CityDublin
Period13/06/2314/06/23

Fingerprint

Dive into the research topics of 'Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN'. Together they form a unique fingerprint.

Cite this