Cerebras Systems’ cover photo
Cerebras Systems

Cerebras Systems

Computer Hardware

Sunnyvale, California 81,473 followers

AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.

About us

Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI. We’re a team of pioneering computer architects, deep learning researchers, and engineers building a new class of AI supercomputers from the ground up. Our flagship system, Cerebras CS-3, is powered by the Wafer Scale Engine 3—the world’s largest and fastest AI processor. CS-3s are effortlessly clustered to create the largest AI supercomputers on Earth, while abstracting away the complexity of traditional distributed computing. From sub-second inference speeds to breakthrough training performance, Cerebras makes it easier to build and deploy state-of-the-art AI—from proprietary enterprise models to open-source projects downloaded millions of times. Here’s what makes our platform different: 🔦 Sub-second reasoning – Instant intelligence and real-time responsiveness, even at massive scale ⚡ Blazing-fast inference – Up to 100x performance gains over traditional AI infrastructure 🧠 Agentic AI in action – Models that can plan, act, and adapt autonomously 🌍 Scalable infrastructure – Built to move from prototype to global deployment without friction Cerebras solutions are available in the Cerebras Cloud or on-prem, serving leading enterprises, research labs, and government agencies worldwide. 👉 Learn more: www.cerebras.ai Join us: https://cerebras.net/careers/

Website
http://www.cerebras.ai
Industry
Computer Hardware
Company size
201-500 employees
Headquarters
Sunnyvale, California
Type
Privately Held
Founded
2016
Specialties
artificial intelligence, deep learning, natural language processing, inference, machine learning, llm, AI, enterprise AI, and fast inference

Products

Locations

Employees at Cerebras Systems

Updates

  • View organization page for Cerebras Systems

    81,473 followers

    Hagay Lupesko foreshadowed an important reality of operating at scale: “If something can fail, it will fail. And it will fail at the worst possible time. Your cloud region will go down, your CDN will go down with it, your own software will fail in ways you never imagined, and every hardware device you rely on will eventually fail.” In his keynote at AI_dev Europe 2025 (Open Source GenAI & ML Summit by The Linux Foundation), Hagay, SVP of Engineering, shared how Cerebras scaled from zero to 50 exaFLOPS of inference compute in under a year and the reliability, observability, and operational lessons learned along the way. 🔮 Full video here: https://lnkd.in/grjcM_jt

  • The Cerebras Mindset: “You can’t be 1–2× better. You have to be 10 or 20× better.”

    View profile for Andrew Feldman

    Founder and CEO, Cerebras Systems, Makers of the worlds's fastest AI infrastructure

    I’ve spent my whole career competing with giants. Now it’s NVIDIA. Earlier, it was Cisco.   When you do that, you learn something important - you can’t be 1–2x better.   You have to be 10 or 20x better. And Cerebras Systems we are more than 20X faster than NVIDIA at inference.   Every dollar we sell, if we didn’t work at it, if we didn’t invent, would default to NVIDIA.   That pressure is what makes us sharper.  And it’s why the work is fun.

  • 👻 Meet REAP: the Reaper for redundant experts.

    View profile for Vithu Thangarasa

    Principal Research Scientist at Cerebras | Efficient Deep Learning

    💥 𝐇𝐨𝐰 𝐝𝐨 𝐲𝐨𝐮 𝐬𝐡𝐫𝐢𝐧𝐤 𝐚 𝐭𝐫𝐢𝐥𝐥𝐢𝐨𝐧-𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫 𝐌𝐨𝐄 𝐦𝐨𝐝𝐞𝐥 𝐢𝐧 𝐡𝐚𝐥𝐟 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐢𝐭𝐬 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬? 💥 Our new research directly tackles this critical deployment challenge. We introduce 𝐑𝐄𝐀𝐏 (𝐑𝐨𝐮𝐭𝐞𝐫-𝐰𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐄𝐱𝐩𝐞𝐫𝐭 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐢𝐨𝐧 𝐏𝐫𝐮𝐧𝐢𝐧𝐠), a one-shot method that precisely removes truly redundant experts while preserving the model’s core generative abilities. Our key finding: for generative tasks like code generation, pruning low-impact experts is fundamentally better than merging them. Merging leads to "functional subspace collapse", a breakdown of the dynamic, input-dependent routing that gives MoEs their strength. In contrast, pruning with REAP preserves this crucial property. The secret behind REAP’s effectiveness lies in how it measures saliency. Rather than relying on usage frequency, REAP evaluates both the router’s gating decisions and each expert’s actual output impact, enabling highly precise, one-shot pruning. The results are striking: on ~500B to 1-trillion-parameter models, REAP removes 50% of experts while retaining over 96% of baseline performance across complex coding, tool-calling, and agentic benchmarks. It can also be applied on top of 4-bit or 8-bit quantized models, compounding memory savings and making state-of-the-art MoEs dramatically more efficient and deployable. Huge thanks to my collaborators: Mike Lasby (lead author), Ivan Lazarevich, Nish Sinnadurai, Sean Lie, and Yani Ioannou for making this work possible! Cerebras Systems remains deeply committed to advancing open, reproducible research that helps move large-scale model efficiency forward for the broader community. #Cerebras #FastestInference #MachineLearning #GenerativeAI

    • No alternative text description for this image
    • No alternative text description for this image
  • Cerebras is now powering Cognition's latest code retrieval models directly in Windsurf Context retrieval has been one of the biggest bottlenecks in agentic coding. When you ask an agent to work on a large codebase, it can spend 60% of its time just searching for relevant files. This retrieval process not only keeps you waiting, but pollutes the context window with irrelevant snippets and racks up your inference bill. Cognition trained two specialized, compact models using RL and deployed them on Cerebras for maximum speed. Querying huge ~1M line codebases like React, Vercel, and PyTorch, swe-grep finds relevant code snippets in a few seconds, compared to minutes on Claude and Cursor. End-to-end retrieval, reasoning, and summarization runs ~5x faster than today’s leading models. Cognition’s Fast Context is available now in Windsurf Cascade. Cerebras is proud to be putting our wafer to work for every Cognition user!

  • 🧮 What does 8×7B actually mean? It is NOT 8 experts with 7B active parameters per token. Turns out it’s actually 13B active parameters. But wait — where does 13B come from? In the world of Mixture of Experts (#MoE), even the simplest questions get surprisingly complex: ❓ How much storage do you actually need? ❓ How much compute does that translate to? ❓ What are the real bottlenecks — memory, compute, or communication? ⁉️ And how does Cerebras solve GPU bottlenecks? If you’ve ever tried to make sense of MoE math, our next post in the MoE 101 guide by Daria Soboleva (and interactive calculator) breaks it all down. https://lnkd.in/g7N6dh69

  • View organization page for Cerebras Systems

    81,473 followers

    The energy this week in Dubai has been electric here at GITEX GLOBAL Largest Tech & Startup Show in the World. We’ve been connecting with innovators, founders, and leaders who are shaping the next wave of AI! From global giants to fast-growing startups, one thing is clear: everyone wants to go faster in AI. 𝗖𝗲𝗿𝗲𝗯𝗿𝗮𝘀 is here to deliver with inference that is 𝟮𝟬𝗫 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗚𝗣𝗨𝘀. Thank you to everyone who stopped by our stand to meet with Andrew Feldman and our team. Your energy inspires us to keep pushing what’s possible.

  • Our Oklahoma City datacenter was designed around water, not air. To keep our wafer-scale systems running at peak performance, we use a Tier III, 6,000-ton chilled-water plant inside a 100,000 sq ft F5-rated facility. Unlike traditional air-cooled systems, our chilled-water design moves heat quietly and efficiently, even under the heaviest AI workloads. It keeps hundreds of millions of cores perfectly balanced as they drive real-time inference at massive scale. A closed-loop system recycles and stores water, maintaining stable cooling without drawing on external sources...even when winds might hit 318 mph. 🌪️ For AI infrastructure at this level, water-based thermal management isn’t just smart, it’s essential.  It’s how we push the boundaries of speed, efficiency, and reliability every single day.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • This is what the fastest AI inference on Earth looks like. Live from Dubai at GITEX GLOBAL Largest Tech & Startup Show in the World and Expand North Star 2025 🚀 Blazing Fast Inference takes center stage Andrew Feldman took the stage to discuss Cerebras Inference delivers breakthrough speed and scale...powering next-gen applications from code generation to agentic AI. 🎉 Packed House At the Booth Developers, enterprises, and world leaders alike are seeing firsthand what’s possible when inference runs 20x faster than GPUs. 🔥 Just Getting Started It’s only the first few days, and the energy is incredible. GITEX is where the world gathers to shape the future of AI, and Cerebras is here to help build it. Stop by Hall 3, Booth H3-B12

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
      +3

Similar pages

Browse jobs

Funding