1. Introduction
Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
2. News
- 2024.05.16: We released the DeepSeek-V2-Lite.
- 2024.05.06: We released the DeepSeek-V2.
3. Model Downloads
Model | #Total Params | #Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-V2-Lite | 16B | 2.4B | 32k | 🤗 HuggingFace |
DeepSeek-V2-Lite-Chat (SFT) | 16B | 2.4B | 32k | 🤗 HuggingFace |
DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace |
DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace |
Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.
4. Evaluation Results
Base Model
Standard Benchmark (Models larger than 67B)
Benchmark | Domain | LLaMA3 70B | Mixtral 8x22B | DeepSeek-V1 (Dense-67B) | DeepSeek-V2 (MoE-236B) |
---|---|---|---|---|---|
MMLU | English | 78.9 | 77.6 | 71.3 | 78.5 |
BBH | English | 81.0 | 78.9 | 68.7 | 78.9 |
C-Eval | Chinese | 67.5 | 58.6 | 66.1 | 81.7 |
CMMLU | Chinese | 69.3 | 60.0 | 70.8 | 84.0 |
HumanEval | Code | 48.2 | 53.1 | 45.1 |