Tokasaurus: An LLM Inference Engine for High-Throughput Workloads Jordan Juravsky Stanford Ayush Chakravarthy Stanford Ryan Ehrlich Stanford Sabri Eyuboglu Stanford Bradley Brown Stanford Joseph Shetaye Stanford Christopher Ré Stanford Azalia Mirhoseini Stanford TL;DR We’re releasing Tokasaurus, a new LLM inference engine optimized for throughput-intensive workloads. With small models, Tokasaurus
