The 2025 OCP Global Summit kicked off yesterday. The keynotes from all the hyperscalers and GPU vendors point to the same reality - scaling AI isn’t just about re-architecting one part of the stack. It’s about rethinking power, cooling, networking, and trust as one coherent, open system. Some interesting takeaways... ✅ Inference workloads are exploding - exceeding 80% CAGR, significantly outpacing training workloads. Token generation has surged ~50× in just two years, fueled by model complexity, longer context windows, and compute intensity. Sustaining that momentum means revisiting how the data centers deliver power/dissipate heat, and how we architect every layer of the system. ✅ Traditional AC distribution is giving way to high-voltage DC (HVDC) designs such as Google and Microsoft’s Mt. Diablo architecture and Nvidia’s Kyber rack. These systems operate at ±400 V to 800 V DC. Several speakers talked about actively managing power spikes using predictive telemetry - powered by ML - to smooth load transients and stabilize the grid. In effect, the data center is evolving from a static load into an adaptive, interactive participant in the electrical ecosystem. ✅ Cooling has become equally central. Across the keynotes, the message was clear. Liquid cooling is now foundational. Nvidia, Google, Meta, and Microsoft highlighted rack-level liquid-cooling initiatives and standardized coolant-distribution interfaces. Cooling is now a co-engineered subsystem - scaling alongside power and compute lifecycles. 💡 Nvidia pushes deeper into networking. Their announcement of Spectrum-X deployments by Meta, Microsoft, and Oracle suggests Nvidia is positioning itself as a serious contender to Broadcom in scale-out Ethernet, including the Data Center Interconnect (DCI) domain, with it's spectrum-XGS. This is impressive. I always thought it is hard to win DCI market (Nvidia calls it "scale-across") without deep buffer switches for absorbing congestion. It would be interesting to learn about their distance-based load balancing for congestion control in scale-across networks. ❓ AMD joins the Ethernet for Scale-Up Networking (ESUN) work stream. This move raises intriguing questions: Is it linked to their recent collaboration agreement with OpenAI? How will they juggle both UALink and ESUN support simultaneously? 🤔 Plenty more to learn in the next few days, I guess - new technologies, new collaborations, and new directions for open infrastructure. Btw, if you’re there, don’t forget to stop by Astera Labs's booth (B33)!
Amazing takeaways! Scaling AI clearly isn’t just about adding more compute, it’s about rethinking power, cooling, and networking as one system. Thanks for sharing Sharada.
"❓ AMD joins the Ethernet for Scale-Up Networking (ESUN) work stream. This move raises intriguing questions: Is it linked to their recent collaboration agreement with OpenAI? How will they juggle both UALink and ESUN support simultaneously? 🤔 " Sharada Yeluri in a joint announcement video yesterday between #openai and #broadcom , broadcom proudly emphasized how their #ai systems were Ethernet-based 🤔 Hey did anyone show a token cost/price decline chart vs demand chart?
Thanks for sharing Sharada!
Will stop to meet the legend!