Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Cai, Yiqiang; Li, Shengchen; Shao, Xi

Computer Science > Sound

arXiv:2408.14862 (cs)

[Submitted on 27 Aug 2024]

Title:Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Authors:Yiqiang Cai, Shengchen Li, Xi Shao

View PDF HTML (experimental)

Abstract:Acoustic scene classification (ASC) predominantly relies on supervised approaches. However, acquiring labeled data for training ASC models is often costly and time-consuming. Recently, self-supervised learning (SSL) has emerged as a powerful method for extracting features from unlabeled audio data, benefiting many downstream audio tasks. This paper proposes a data-efficient and low-complexity ASC system by leveraging self-supervised audio representations extracted from general-purpose audio datasets. We introduce BEATs, an audio SSL pre-trained model, to extract the general representations from AudioSet. Through extensive experiments, it has been demonstrated that the self-supervised audio representations can help to achieve high ASC accuracy with limited labeled fine-tuning data. Furthermore, we find that ensembling the SSL models fine-tuned with different strategies contributes to a further performance improvement. To meet low-complexity requirements, we use knowledge distillation to transfer the self-supervised knowledge from large teacher models to an efficient student model. The experimental results suggest that the self-supervised teachers effectively improve the classification accuracy of the student model. Our best-performing system obtains an average accuracy of 56.7%.

Comments:	Accepted by DCASE Workshop 2024
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2408.14862 [cs.SD]
	(or arXiv:2408.14862v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2408.14862

Submission history

From: Yiqiang Cai [view email]
[v1] Tue, 27 Aug 2024 08:33:26 UTC (2,180 KB)

Computer Science > Sound

Title:Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators