Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification
Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification
Aditya Sridhar
TLDR
This work bridges the gap between technical classification tasks and the nuanced, human experience of genre, highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition.
Abstract
Music genre classification is a critical component of music recommendation systems, generation algorithms, and cultural analytics. In this work, we present an innovative model for classifying music genres using attention-based temporal signature modeling. By processing spectrogram sequences through Convolutional Neural Networks (CNNs) and multi-head attention layers, our approach captures the most temporally significant moments within each piece, crafting a unique"signature"for genre identification. This temporal focus not only enhances classification accuracy but also reveals insights into genre-specific characteristics that can be intuitively mapped to listener perceptions. Our findings offer potential applications in personalized music recommendation systems by highlighting cross-genre similarities and distinctiveness, aligning closely with human musical intuition. This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.
