UPDF AI

A New Spatio-Temporal Neural Architecture with Bi-LSTM for Multimodal Emotion Recognition

S. Jothimani,S. N. Sangeethaa,K. Premalatha,R. Sathishkannan

2023 · DOI: 10.1109/ICCES57224.2023.10192713
International Conference on Communication and Electronics Systems · 6 Citations

Abstract

Emotional recognition uses implicit annotation to detect the user's emotional response to multimedia. This helps create efficient user-centric services. Researchers are increasingly using physiologically-based ways to convey emotions objectively. Traditional methods for addressing the problem of emotion recognition have mostly concentrated their efforts on the extraction of various sorts of manually-created characteristics. Nevertheless, features that are hand-crafted always require domain expertise for the individual job, and the process of building the appropriate features may take more time. As a result, determining the physiologically based temporal feature representation that is the most useful for emotion identification has become the primary focus of the majority of recent research. A variety of different applications require emotional identification in conversations as a first step. These applications include opinion mining over chat history, threads on social media, debates, argumentation mining, and interpreting customer feedback in live discussions, amongst others. Currently, systems do not adapt to the various individuals in a debate by considering each comment as if it were made by a distinct individual. Integrating information from multiple modalities requires careful selection and fusion of relevant features. This can be a complex task, as different modalities may have different temporal and spatial characteristics, and may provide redundant or conflicting information. This study's objective is to describe a unique strategy based on deep learning, Convolutional Neural Networks, and Bidirectional Long Short-Term Memory that keeps track of the independent party systems across a whole conversation in order to identify emotional responses using the data gathered from such provinces. For two separate datasets RAVDESS and SAVEE, our model achieves much better results 100% than the current state-of-the-art solution.