posted on 2024-07-12, 20:45authored byFelicia Andayani
Speech Emotion Recognition (SER) is a task of recognizing emotions by learning the features extracted from speech signals. This research focuses on designing and developing an LSTM-Transformer hybrid model for the SER system to learn the long-term dependencies in speech signals as well as investigating its impacts on the classification performance of SER. The resulting recognition accuracy showed that the LSTM-Transformer hybrid model could learn the temporal information from the frequency distributions according to the Mel-Frequency Cepstral Coefficients (MFCCs) of each emotion on language-independent and language-dependent datasets.
History
Thesis type
Thesis (Masters by research)
Thesis note
A thesis submitted in fulfilment of the requirements for the degree of Master of Science (Research) performed at Swinburne University of Technology, Sarawak, June 2022.