Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.
Cosine similarity-based few-shot bioacoustic event detection with automatic frequency range identification in Mel-spectrograms.
Sheng-Lun Kao,Yi-Wen Liu
TLDR
An automatic frequency range identification algorithm, designed to handle small objects in Mel-spectrograms effectively, is introduced, which performs event detection by computing cosine similarity in a gliding-window manner.
Abstract
Few-shot sound event detection (SED) has been an appealing idea in bioacoustics due to its potentials in reducing the labor of labeling the recordings to just a few positive examples. Since sounds can be represented as images in the time-frequency plane, existing few-shot-SED methods often borrow techniques from object detection in image processing, such as the prototypical networks. When applied to bioacoustic SED, however, prototypical networks encounter significant challenges. For instance, the main acoustic targets in a spectrogram are often small. The background is typically noisy, leading to overlap in frequency between positive and negative events. Furthermore, the positive events may be weak compared to the background. To overcome these difficulties, this study introduces an automatic frequency range identification algorithm, which is designed to handle small objects in Mel-spectrograms effectively. After the desired frequency range is identified, the system performs event detection by computing cosine similarity in a gliding-window manner. Overall, the system does not require a large amount of training data, or rely on pretrained models. An F-score of 46.9% was achieved on DCASE 2024 Task 5, placing it on top 3 and demonstrating its potential in addressing the unique challenges of bioacoustic SED.
