Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation

Ouyang, Siru; Wang, Shuohang; Jiang, Minhao; Zhong, Ming; Yu, Donghan; Han, Jiawei; Shen, Yelong

Computer Science > Computation and Language

arXiv:2410.10141 (cs)

[Submitted on 14 Oct 2024]

Title:Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation

Authors:Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen

View PDF HTML (experimental)

Abstract:Speculative decoding stands as a pivotal technique to expedite inference in autoregressive (large) language models. This method employs a smaller draft model to speculate a block of tokens, which the target model then evaluates for acceptance. Despite a wealth of studies aimed at increasing the efficiency of speculative decoding, the influence of generation configurations on the decoding process remains poorly understood, especially concerning decoding temperatures. This paper delves into the effects of decoding temperatures on speculative decoding's efficacy. Beginning with knowledge distillation (KD), we first highlight the challenge of decoding at higher temperatures, and demonstrate KD in a consistent temperature setting could be a remedy. We also investigate the effects of out-of-domain testing sets with out-of-range temperatures. Building upon these findings, we take an initial step to further the speedup for speculative decoding, particularly in a high-temperature generation setting. Our work offers new insights into how generation configurations drastically affect the performance of speculative decoding, and underscores the need for developing methods that focus on diverse decoding configurations. Code is publically available at this https URL.

Comments:	EMNLP 2024 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2410.10141 [cs.CL]
	(or arXiv:2410.10141v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.10141

Submission history

From: Siru Ouyang [view email]
[v1] Mon, 14 Oct 2024 04:17:45 UTC (1,690 KB)

Computer Science > Computation and Language

Title:Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators