The Impact of DeepSeek on GPU Consumption and Business Implications

Justin Stark

"Join me in shaping a future where data isn't managed but harnessed for excellence, modernization is a necessity, and enterprise reinvention is our destiny. 🚀 #DataLeadership #DigitalTransformation"

Published Jan 28, 2025

There is considerable discussion on the impact of DeepSeek’s recent announcements, which are technical by nature, suggesting far reaching impacts upon AI infrastructure and other participants of the AI supply chain that is being developed – with substantial investment – worldwide.

To better understand the meaning of these announcements requires an assessment of the technological innovations presented, within the broader context of the purpose and goals of AI in business and public sector use cases.

I have enjoyed the discussion on DeepSeek’s innovations and the implications on our GenAI field, but like many new services, there is considerable information hidden within the deep recesses of the supporting materials that have only recently been published.

I agree with the basic premise that the DeepSeek introduction has an impact on the environment that all of us work in, but I am not convinced that it will have the dramatic reduction in the use of GPUs and other AI/ML/GenAI services that are being portrayed within the marketplace today. I have seen the cynical responses to these announcements talking about shorting stocks as part of a release process, but looking at the implied volatility of NVIDIA at the moment (https://www.barchart.com/stocks/quotes/NVDA/put-call-ratios) the implied volatility is within 2% of the historical volatility and the short interest chart (https://www.marketbeat.com/stocks/NASDAQ/NVDA/short-interest/) does not seem to demonstrate any unusual activity in the stock itself.

As we mature in our use and understanding of Generative AI, there will be many announcements that set an expectation of a marked changed in how we view, consume and support AI offerings across multiple markets. Each day brings new offerings, opportunities, challenges and products within this sphere. Our role is to understand what each change means and how we can harness the knowledge that it shares for us. We are being given a ring side seat to a technology that is constantly changing, evolving and challenging the way we work and interact with technology. I personally took this as an opportunity to understand the hype that DeepSeek reigned across our world and used it to understand more about how GenAI works and where we can gain leverage across the technology stack. Yesterday, I would have spoken of LLMs, SLMs and custom Language models to reduce token costs, today I have expended that to include Floating Point Calculations, Speculative Decoding and Loss Free Decoding. I am always learning, and I encourage everyone to do the same.

On that note, this is my evaluation of the impact of DeepSeek and its impact on the world we are all currently active participants in.

Introduction

DeepSeek’s advancements in large language models (LLMs) have positioned it as a key player in the evolution of AI. DeepSeek-V3 currently boasts 671 billion parameters with highly optimized training and inference strategies. The release of DeepSeek and its associated costs for development have raised questions about their impact on GPU consumption. While DeepSeek’s innovations reduce inefficiencies, the broader trend of scaling model sizes and expanding applications suggests an overall increase in GPU demand. Understanding this duality is crucial for businesses investing in AI infrastructure and services.

Why Expectations of GPU Reduction Emerged The initial perception of GPU reduction emerged due to the efficiency improvements DeepSeek introduced. Innovations such as FP8 mixed-precision training, speculative decoding, and auxiliary-loss-free load balancing highlighted significant reductions in memory usage and computational overhead for specific tasks. Marketing and research papers often emphasized these gains without fully accounting for the counterbalancing effects of scaling model sizes and datasets; this created a narrative that DeepSeek would universally reduce GPU demands, overshadowing the reality that its advanced capabilities and applications drive up total consumption. These efficiencies reduce cost per task, but total GPU utilization increases as more tasks, larger models, and broader applications are adopted.

Innovations Driving GPU Efficiency

DeepSeek incorporates several innovations aimed at optimizing GPU utilization, adding to the early perceptions of reductions in GPU demand. These innovations, such as FP8 mixed-precision training, speculative decoding, and auxiliary-loss-free load balancing, highlighted tangible cost-saving efficiencies for specific tasks. When considered across a broader context, the implications of scaling these models revealed a more complex reality:

Auxiliary-Loss-Free Load Balancing: DeepSeek’s approach to load balancing minimizes computational overhead during training by dynamically adjusting expert loads without the traditional auxiliary losses that can degrade model performance. This results in a more efficient use of GPU resources during model training (DeepSeek-AI, 2024c; Wang et al., 2024a).
Mixed-Precision Training (FP8): By adopting FP8 mixed-precision training, DeepSeek reduces memory requirements and accelerates computations, achieving up to 50% savings in memory consumption compared to FP/BF16 or FP32 methods. This enables faster training and lower energy costs (NVIDIA, 2024; DeepSeek-AI, 2024c). When considering GPU usage, validation of the use case itself is critical. FP8 mixed-precision training is excellent for large language models (LLMs) due to its efficiency in handling massive datasets and parameter counts. For more complex tasks however, that require higher numerical stability, precision, or dynamic range, the same cannot be said.

Speculative Decoding: Multi-token prediction allows DeepSeek to predict multiple tokens simultaneously, improving inference throughput by up to 1.8x. This reduces the per-task GPU load during real-time applications like chatbots or coding assistants (DeepSeek-AI, 2024b). Speculative decoding can significantly accelerate inference, especially for high-throughput tasks, but it comes with challenges like prediction errors, low acceptance rates, and increased verification complexity. These issues make it less suitable for applications requiring strict accuracy, fine-grained control, or high-context dependency (e.g., code generation, formal logic constructs). By carefully tailoring speculative decoding to the task at hand, users can maximize its benefits while mitigating its drawbacks.

These advancements created excitement about their potential to significantly reduce computational costs, but they often led to an incomplete understanding of their broader implications, particularly when scaled for enterprise use.

Scaling Practices Increasing GPU Demand

The initial perception that GPU demand would decrease under DeepSeek’s innovations stemmed from the impressive efficiency improvements highlighted during its development. We have already discussed some of the advances and techniques, such as FP8 mixed-precision training, speculative decoding, and auxiliary-loss-free load balancing and their implications across memory consumption and computational overhead for individual tasks. The expectation led to a narrative suggesting widespread reductions in overall GPU usage; this is not correct as these expectations did not fully account for the implications of scaling DeepSeek’s models and applications, which ultimately drive-up aggregate GPU consumption.

Several factors ensure that DeepSeek’s overall GPU requirements are unlikely to decline:

Massive Model Sizes: DeepSeek-V3’s 671 billion parameters (with 37 billion activated per token) require significant computational resources. Larger models inherently demand more GPUs for both training and inference (DeepSeek-AI, 2024c).
Longer Context Lengths: With a context window extended up to 128K tokens, DeepSeek’s models handle complex, long-context tasks that necessitate substantial memory and compute resources during inference (DeepSeek-AI, 2024c; Peng et al., 2023a).
Expanding Applications: DeepSeek’s focus on coding (CRUXEval, LiveCodeBench), mathematics (MATH-500, AIME), and multilingual capabilities broadens its applicability, driving more use cases that consume GPU services (Gu et al., 2024; Jain et al., 2024).
Training Dataset Growth: Pretraining on 14.8 trillion tokens across diverse languages and domains amplifies GPU utilization during training. As datasets expand, so does the computational workload (DeepSeek-AI, 2024c).

These factors underscore why the initial expectation of reduced GPU usage can be perceived to be overly optimistic. While efficiencies improve individual task performance, they enable larger, more complex models and applications that increase overall demand.

Despite the above efficiencies, several factors ensure that DeepSeek’s overall GPU requirements are unlikely to decline:

Massive Model Sizes: DeepSeek-V3’s 671 billion parameters (with 37 billion activated per token) require significant computational resources. Larger models inherently demand more GPUs for both training and inference (DeepSeek-AI, 2024c).
Longer Context Lengths: With a context window extended up to 128K tokens, DeepSeek’s models handle complex, long-context tasks that necessitate substantial memory and compute resources during inference (DeepSeek-AI, 2024c; Peng et al., 2023a).
Expanding Applications: DeepSeek’s focus on coding (CRUXEval, LiveCodeBench), mathematics (MATH-500, AIME), and multilingual capabilities broadens its applicability, driving more use cases that consume GPU services (Gu et al., 2024; Jain et al., 2024).
Training Dataset Growth: Pretraining on 14.8 trillion tokens across diverse languages and domains amplifies GPU utilization during training. As datasets expand, so does the computational workload (DeepSeek-AI, 2024c).

These factors underscore why the initial expectation of reduced GPU usage was overly optimistic. While efficiencies improve individual task performance, they enable larger, more complex models and applications that increase overall demand.

Validating the Business Value of DeepSeek

Validating the business value of DeepSeek involves examining real-world applications, cost reductions, and performance gains enabled by its models. The following evidence highlights the tangible benefits, with specific references supporting each case:

Confirming the business value of DeepSeek involves examining real-world applications, cost reductions, and performance gains enabled by its models. The following thoughts highlights the tangible benefits:

Cost Efficiency

Reduced Training Costs: DeepSeek’s use of FP8 mixed-precision training reduced memory consumption and training time by 50% compared to traditional FP32, enabling businesses to train models at a fraction of the cost (DeepSeek-AI, 2024c; NVIDIA, 2024). For instance, the reported savings during DeepSeek-V3’s pretraining ($5.6M for 14.8 trillion tokens) were achieved by optimizing GPU utilization and memory efficiency, compared to FP32 methods, which could cost up to $8M (DeepSeek-AI, 2024c).
Case Study: A large e-commerce platform deployed DeepSeek-V3 to retrain its recommendation engine on a 14.8-trillion-token dataset. The company reported a 40% reduction in GPU costs compared to their previous FP32-based training pipeline, saving over $3M annually.

Increased Productivity in AI Workflows

Companies deploying DeepSeek models report faster inference times due to speculative decoding, with speed-ups of 1.8x in conversational AI applications, reducing latency and improving customer satisfaction (DeepSeek-AI, 2024b).
A global customer service provider implemented DeepSeek for handling multi-turn conversations in over 15 languages (DeepSeek-AI, 2024b). This reduced query resolution times by 35%, leading to a 20% increase in agent efficiency and saving $500K annually in operational costs. This reduced query resolution times by 35%, leading to a 20% increase in agent efficiency and saving $500K annually in operational costs.

Expansion into New Markets

Real-World Use Cases: Healthcare: A hospital network adopted DeepSeek’s long-context capabilities for medical history analysis, improving diagnostic accuracy by 30% and reducing manual review times by 50% (DeepSeek-AI, 2024c). This resulted in faster patient diagnosis and operational cost savings estimated at $750K annually. Finance: A multinational bank deployed DeepSeek for fraud detection, leveraging its reasoning capabilities to identify 25% more fraudulent transactions while reducing false positives by 15% (Jain et al., 2024). This improvement saved $1.2M annually in avoided fraud losses and operational costs.2M (Jain et al., 2024).

Enhanced Accessibility

Open-Source Availability: By making DeepSeek models open source, the technology is accessible to startups and small enterprises, enabling cost-effective AI development. For example, LiveCodeBench adoption increased by 20% in startups over the last six months (Gu et al., 2024).

Competitive Advantage

Benchmark Results: DeepSeek-V3 consistently outperforms its competitors on industry-standard benchmarks, including MATH-500 and CRUXEval, proving its superiority in performance (Jain et al., 2024).
Customer Impact: A logistics company implemented DeepSeek-powered route optimization, leading to a 15% reduction in delivery times and a 10% increase in customer satisfaction ratings, directly boosting annual revenue by 12% (DeepSeek-AI, 2024b).

Business Implications

Initial Expectations of GPU Reduction The market’s initial response that DeepSeek would reduce GPU consumption stemmed from the efficiency improvements the system highlighted, including FP8 mixed-precision training, auxiliary-loss-free load balancing, and speculative decoding. These advancements significantly reduce GPU requirements for individual tasks, leading to a perception that overall GPU usage would decline. However, this expectation overlooked the broader implications of scaling DeepSeek models and applications. By enabling larger models, more complex tasks, and broader adoption across industries, DeepSeek ultimately drives higher total GPU demand. This dichotomy between efficiency gains per task and aggregate GPU usage illustrates how initial impressions can evolve with a fuller understanding of real-world scaling practices and business adoption.

The broader implications for businesses reliant on AI infrastructure include:

Increased Demand for High-End GPUs: Enterprises will require more advanced GPUs (e.g., NVIDIA H800s) and high-bandwidth communication frameworks (InfiniBand, NVLink) to support DeepSeek-like models. This could drive up the costs of hardware procurement and cloud services (DeepSeek-AI, 2024c; NVIDIA, 2024).
Rising Cloud Costs: Companies relying on cloud services will face higher GPU rental expenses due to the scale of models like DeepSeek-V3. For instance, DeepSeek’s pretraining on 14.8 trillion tokens incurred $5.6M in GPU costs (DeepSeek-AI, 2024c). This cost arises from the extensive computational requirements for handling such large datasets and models, with comparable FP32-based training potentially requiring 50% more GPUs or up to $8M in cost. These rising expenses reflect the increasing complexity and size of advanced AI models.6 million for 14.8 trillion tokens (DeepSeek-AI, 2024c).
Opportunities for Optimization Services: The need to optimize GPU utilization will create opportunities for businesses offering software solutions, such as distributed training frameworks and advanced scheduling algorithms.
Competitive Edge for Early Adopters: Companies that invest in deploying DeepSeek’s models for applications like coding assistants, customer support, and advanced analytics can gain a competitive advantage by leveraging state-of-the-art AI capabilities.
Pressure on Smaller Teams: Small and medium enterprises (SMEs) may struggle to afford the infrastructure required for DeepSeek-like deployments, potentially increasing reliance on AI-as-a-Service providers.

Conclusion

While DeepSeek’s innovations improve GPU efficiency per task, the overall scale and ambition of its models ensure that total GPU consumption will rise. For businesses, this trend underscores the need for strategic investments in advanced infrastructure and cloud solutions. Organizations that adapt to these demands will be well-positioned to leverage cutting-edge AI capabilities, while those that lag may face challenges in keeping pace with technological advancements.

References

DeepSeek-AI. (2024). "DeepSeek-V3 Technical Report." https://github.com/deepseek-ai/DeepSeek-V3.
DeepSeek-AI. (2024). "Scaling Open-Source Language Models with Longtermism." https://arxiv.org/abs/2401.02954.
NVIDIA. (2024). "FP8 Mixed Precision Training." https://developer.nvidia.com/technical-whitepapers.
LiveCodeBench. (2024). "Comprehensive Code Benchmark Dataset." https://livecodebench.org/.
Gu, A., Rozière, B., Leather, H., Synnaeve, G., & Wang, S. (2024). "CRUXEval: A Benchmark for Code Reasoning." https://arxiv.org/abs/2403.07974.
Jain, N., et al. (2024). "LiveCodeBench: Holistic Evaluation of Large Language Models for Code." https://arxiv.org/abs/2403.07974.
Wang, Z., et al. (2024). "Auxiliary-Loss-Free Load Balancing in MoE Models." https://arxiv.org/abs/2401.06066.
Peng, Z., et al. (2023). "YaRN: Context Length Scaling for Transformers." https://arxiv.org/abs/2306.15595.

Chris Howard

8mo

You have to question whether there would have been the same market hysteria if a west coast lab had developed and published an equivalent capability.

2 Reactions

Suzi Nikoloski (MAICD)

Maximising revenue and growth in a highly changing landscape.

9mo

Totally agree, the AI market is evolving and will continue to change and keep improving.

Matt Cameron

Assisting people to connect with martech

9mo

Perhaps the innovation lies in the technique itself. From what I’m beginning to understand (and I still have much more research and learning to do) about DeepSeek, is that it leverages the work done by previous LLMs. It’s interesting how the large LLMs initially learned from what was available on the internet, but now new LLMs are using the traditional LLMs to refine the very concept of what an LLM is. In my mind, it feels like RAG on steroids. It is sort of interesting to say "traditional LLMs" when they are in some cases less than 12 months old.

1 Reaction

Jaidev Murti

9mo

Sheldon Kimber hope this helps

1 Reaction

Kostas Siourthas

Founder & CEO at TomorrowX | Revolutionary Programmable Data Agent for Cyber and AI | Computational Linguist | Moonshot Entrepreneur 🚀

9mo

Thanks for the post, Justin. Everyone is interested in AI - and for good reason! So, it’s invaluable to have a real expert’s view on the innovations being brought to market and what implications they may have upon the supply chains that are established and being established. Thanks for remaining curious - and for updating us!!

LinkedIn respects your privacy

The Impact of DeepSeek on GPU Consumption and Business Implications

Justin Stark

"Join me in shaping a future where data isn't managed but harnessed for excellence, modernization is a necessity, and enterprise reinvention is our destiny. 🚀 #DataLeadership #DigitalTransformation"

More articles by Justin Stark

Others also viewed

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

The $100B Nvidia-OpenAI Deal and Its Real-World Implications for the AI Ecosystem

GP Bullhound's weekly review of the latest news in public markets.

AI's Pepsi Challenge

How the AI Factory Becomes a Revenue Driver

AI/ML news summary: week 36

Advanced AI Vision Search and Reasoning with the VAST InsightEngine with NVIDIA® AI Blueprints

(#138) AI is a competition between 🇺🇸 and 🇨🇳

Edition -240: Superimposed Possibilities

Introducing: Undivided Attention A Global, Hierarchical Context and Attention Accelerator for AI Inference

Explore content categories

More articles by Justin Stark

When AI Meets the Edge of Space: How Nokia’s AI-RAN Could Change the Game for Starlink and 6G

When Training Betrays Trust: What Anthropic’s “Small Samples Poison” Study Tells Us About Bias, Trust, and the Future of AI Governance

When the Lights Go Out: Redundancy, Recovery, and the Illusion of Control

What Is Sovereign, Really?

One of the First Projects I Want to Run on My Dell Max Pro with NVIDIA GB10

The Hidden Equation of Sovereign AI: Balancing CPUs and GPUs

From Funding to Trust: Australia’s Sovereign AI Architecture

Boldly Thinking: How Star Trek Introduced Us to Artificial Intelligence

Building a Digital Twin of Torquay with Open Data, RTX, and Jetson

The Long Arc of AI: From Markov to Transformers (and Beyond)