UPDF AI

Deep Dive Into Music Videos: Hierarchical Emotion Recognition With Rich Audio and Visual Features

Y. R. Pandeya,Ashim Gelal,Harish Chandra Bhandari,Priya Pandey

2025 · DOI: 10.1155/int/5621651
International Journal of Intelligent Systems · 0 Citations

TLDR

The convolutional neural network models for 1D, 2D, and 3D audio and video processing outperformed existing methods in various scenarios while requiring minimal training parameters.

Abstract

This study aimed to address the challenges of cultural diversity and limited labeled data for music emotion classification. We introduced a benchmark dataset for music videos, featuring hierarchical emotion labels ranging from coarse to fine levels. We considered six established audio and video features, including geometric, spectral, harmonic, temporal, spatiotemporal, and visual attributes, for music emotion classification. We proposed hierarchical music video emotion classification networks and established baseline results using our dataset. Additionally, we presented a pipeline for audio processing using graph neural networks with reduced edge connections. Our convolutional neural network models for 1D, 2D, and 3D audio and video processing outperformed existing methods in various scenarios while requiring minimal training parameters. The study utilizes both quantitative measures and visual analysis to evaluate the results.

Cited Papers
Citing Papers