AI at Meta reposted this
📣 Blackwell sets a new inference speed world record — A single NVIDIA DGX B200 server with eight #NVIDIABlackwell GPUs can generate over 1,000 tokens per second (TPS) per user on the Llama 4 Maverick model, the largest and most powerful model in the AI at Meta Llama 4 collection. ⚡⏱️ Additionally, a system with eight Blackwell GPUs can also deliver up to 72,000 tokens/second in a maximum throughput scenario. 🏆 Blackwell is the first platform to achieve this model performance, demonstrating how it delivers the best combination of throughput, output speed and accuracy for LLM token generation. 🏎️🏁 See our tech blog for details ➡️ https://nvda.ws/4kadNB2