TY - GEN
T1 - Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
AU - Enriquez, Marc Dominic
AU - Lucas, Crisron Rudolf
AU - Aquino, Angelina
N1 - Funding Information:
This research was conducted with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106 P2 at the ADAPT SFI Research Centre at University College Dublin. ADAPT, the SFI Research Centre for AI-Driven Digital Content Technology, is funded by Science Foundation Ireland through the SFI Research Centres Programme. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.
AB - Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.
KW - CNN
KW - Fourier Transform
KW - Scalogram
KW - SER
KW - Spectrogram
KW - Wavelet Transform
UR - http://www.scopus.com/inward/record.url?scp=85166020806&partnerID=8YFLogxK
U2 - 10.1109/ISSC59246.2023.10162085
DO - 10.1109/ISSC59246.2023.10162085
M3 - Conference Paper published in Proceedings
AN - SCOPUS:85166020806
T3 - 2023 34th Irish Signals and Systems Conference, ISSC 2023
SP - 1
EP - 6
BT - 2023 34th Irish Signals and Systems Conference, ISSC 2023
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 34th Irish Signals and Systems Conference, ISSC 2023
Y2 - 13 June 2023 through 14 June 2023
ER -