SEED: Speaker Embedding Enhancement Diffusion Model

Nam, KiHyun; Heo, Jungwoo; Jung, Jee-weon; Park, Gangin; Jung, Chaeyoung; Yu, Ha-Jin; Chung, Joon Son

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2505.16798 (eess)

[Submitted on 22 May 2025]

Title:SEED: Speaker Embedding Enhancement Diffusion Model

Authors:KiHyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung

View PDF HTML (experimental)

Abstract:A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a pre-trained speaker recognition model and generates refined embeddings. For training, our approach progressively adds Gaussian noise to both clean and noisy speaker embeddings extracted from clean and noisy speech, respectively, via forward process of a diffusion model, and then reconstructs them to clean embeddings in the reverse process. While inferencing, all embeddings are regenerated via diffusion process. Our method needs neither speaker label nor any modification to the existing speaker recognition pipeline. Experiments on evaluation sets simulating environment mismatch scenarios show that our method can improve recognition accuracy by up to 19.6% over baseline models while retaining performance on conventional scenarios. We publish our code here this https URL

Comments:	Accepted to Interspeech 2025. The official code can be found at this https URL
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2505.16798 [eess.AS]
	(or arXiv:2505.16798v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2505.16798

Submission history

From: KiHyun Nam [view email]
[v1] Thu, 22 May 2025 15:38:37 UTC (718 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SEED: Speaker Embedding Enhancement Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SEED: Speaker Embedding Enhancement Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators