Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

Raina, Vyas; Gales, Mark

Computer Science > Sound

arXiv:2407.04482 (cs)

[Submitted on 5 Jul 2024 (v1), last revised 11 Oct 2024 (this version, v2)]

Title:Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

Authors:Vyas Raina, Mark Gales

View PDF HTML (experimental)

Abstract:Speech enabled foundation models, either in the form of flexible speech recognition based systems or audio-prompted large language models (LLMs), are becoming increasingly popular. One of the interesting aspects of these models is their ability to perform tasks other than automatic speech recognition (ASR) using an appropriate prompt. For example, the OpenAI Whisper model can perform both speech transcription and speech translation. With the development of audio-prompted LLMs there is the potential for even greater control options. In this work we demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks. Without any access to the model prompt it is possible to modify the behaviour of the system by appropriately changing the audio input. To illustrate this risk, we demonstrate that it is possible to prepend a short universal adversarial acoustic segment to any input speech signal to override the prompt setting of an ASR foundation model. Specifically, we successfully use a universal adversarial acoustic segment to control Whisper to always perform speech translation, despite being set to perform speech transcription. Overall, this work demonstrates a new form of adversarial attack on multi-tasking speech enabled foundation models that needs to be considered prior to the deployment of this form of model.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2407.04482 [cs.SD]
	(or arXiv:2407.04482v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2407.04482

Submission history

From: Vyas Raina [view email]
[v1] Fri, 5 Jul 2024 13:04:31 UTC (467 KB)
[v2] Fri, 11 Oct 2024 17:21:40 UTC (467 KB)

Computer Science > Sound

Title:Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators