Cerebras Systems

Cerebras Systems · 2025-10-13T17:37:38.345Z

Cerebras is now available via Pay Per Token It's here - the world's fastest AI inference available via self-serve. For $10, you get instant access to leading open-weight models like Qwen3 235B, GPT OSS 120B, and Qwen3 Coder 480B—running at 20x the speed of GPU-based model providers. Get started at cloud.cerebras.ai

Computer Hardware

Sunnyvale, California 81,473 followers

AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.

See jobs Follow

View all 733 employees

About us

Cerebras Systems is the world's fastest AI inference. We are powering the future of generative AI. We’re a team of pioneering computer architects, deep learning researchers, and engineers building a new class of AI supercomputers from the ground up. Our flagship system, Cerebras CS-3, is powered by the Wafer Scale Engine 3—the world’s largest and fastest AI processor. CS-3s are effortlessly clustered to create the largest AI supercomputers on Earth, while abstracting away the complexity of traditional distributed computing. From sub-second inference speeds to breakthrough training performance, Cerebras makes it easier to build and deploy state-of-the-art AI—from proprietary enterprise models to open-source projects downloaded millions of times. Here’s what makes our platform different: 🔦 Sub-second reasoning – Instant intelligence and real-time responsiveness, even at massive scale ⚡ Blazing-fast inference – Up to 100x performance gains over traditional AI infrastructure 🧠 Agentic AI in action – Models that can plan, act, and adapt autonomously 🌍 Scalable infrastructure – Built to move from prototype to global deployment without friction Cerebras solutions are available in the Cerebras Cloud or on-prem, serving leading enterprises, research labs, and government agencies worldwide. 👉 Learn more: www.cerebras.ai Join us: https://cerebras.net/careers/

Website: http://www.cerebras.ai
External link for Cerebras Systems
Industry: Computer Hardware
Company size: 201-500 employees
Headquarters: Sunnyvale, California
Type: Privately Held
Founded: 2016
Specialties: artificial intelligence, deep learning, natural language processing, inference, machine learning, llm, AI, enterprise AI, and fast inference

Products

Cerebras: The Fastest AI Inference

Unmatched Speed & Intelligence Deploy frontier models at production scale with world-record speeds—no compromises on model size or precision. Run full-parameter models faster than anyone else. Blazing AI Inference powered by the World's Fastest Processor The Cerebras Wafer-Scale Engine is purpose-built for ultra-fast AI. No number of GPUs can match our speed. Designed for builders who want to do extraordinary things.

Locations

Primary

1237 E Arques Ave

Sunnyvale, California 94085, US

Get directions
150 King St W

Toronto, Ontario M5H 1J9, CA

Get directions
Tokyo, JP

Get directions
Bangalore, IN

Get directions

Employees at Cerebras Systems

See all employees

Updates

Cerebras Systems

81,473 followers
17h Edited
Report this post
Hagay Lupesko foreshadowed an important reality of operating at scale: “If something can fail, it will fail. And it will fail at the worst possible time. Your cloud region will go down, your CDN will go down with it, your own software will fail in ways you never imagined, and every hardware device you rely on will eventually fail.” In his keynote at AI_dev Europe 2025 (Open Source GenAI & ML Summit by The Linux Foundation), Hagay, SVP of Engineering, shared how Cerebras scaled from zero to 50 exaFLOPS of inference compute in under a year and the reliability, observability, and operational lessons learned along the way. 🔮 Full video here: https://lnkd.in/grjcM_jt

Like Comment Share
Cerebras Systems

81,473 followers
22h
Report this post
The Cerebras Mindset: “You can’t be 1–2× better. You have to be 10 or 20× better.”

Andrew Feldman

Founder and CEO, Cerebras Systems, Makers of the worlds's fastest AI infrastructure
4d

I’ve spent my whole career competing with giants. Now it’s NVIDIA. Earlier, it was Cisco. When you do that, you learn something important - you can’t be 1–2x better. You have to be 10 or 20x better. And Cerebras Systems we are more than 20X faster than NVIDIA at inference. Every dollar we sell, if we didn’t work at it, if we didn’t invent, would default to NVIDIA. That pressure is what makes us sharper. And it’s why the work is fun.

Like Comment Share
Cerebras Systems reposted this
Sean Lie

Co-Founder and CTO at Cerebras Systems
4d
Report this post
Thank you to Ivo Bolsens and the Stanford SystemX Alliance for inviting me to speak at the seminar today. To learn more about how the Cerebras Systems wafer scale architecture uniquely enables a new generation of instant and realtime AI, and sign up to try it for yourself, go to https://www.cerebras.ai/.
Like Comment Share
Cerebras Systems

81,473 followers
3d
Report this post
👻 Meet REAP: the Reaper for redundant experts.
Vithu Thangarasa

Principal Research Scientist at Cerebras | Efficient Deep Learning
4d Edited

💥 𝐇𝐨𝐰 𝐝𝐨 𝐲𝐨𝐮 𝐬𝐡𝐫𝐢𝐧𝐤 𝐚 𝐭𝐫𝐢𝐥𝐥𝐢𝐨𝐧-𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫 𝐌𝐨𝐄 𝐦𝐨𝐝𝐞𝐥 𝐢𝐧 𝐡𝐚𝐥𝐟 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐢𝐭𝐬 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬? 💥 Our new research directly tackles this critical deployment challenge. We introduce 𝐑𝐄𝐀𝐏 (𝐑𝐨𝐮𝐭𝐞𝐫-𝐰𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐄𝐱𝐩𝐞𝐫𝐭 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐢𝐨𝐧 𝐏𝐫𝐮𝐧𝐢𝐧𝐠), a one-shot method that precisely removes truly redundant experts while preserving the model’s core generative abilities. Our key finding: for generative tasks like code generation, pruning low-impact experts is fundamentally better than merging them. Merging leads to "functional subspace collapse", a breakdown of the dynamic, input-dependent routing that gives MoEs their strength. In contrast, pruning with REAP preserves this crucial property. The secret behind REAP’s effectiveness lies in how it measures saliency. Rather than relying on usage frequency, REAP evaluates both the router’s gating decisions and each expert’s actual output impact, enabling highly precise, one-shot pruning. The results are striking: on ~500B to 1-trillion-parameter models, REAP removes 50% of experts while retaining over 96% of baseline performance across complex coding, tool-calling, and agentic benchmarks. It can also be applied on top of 4-bit or 8-bit quantized models, compounding memory savings and making state-of-the-art MoEs dramatically more efficient and deployable. Huge thanks to my collaborators: Mike Lasby (lead author), Ivan Lazarevich, Nish Sinnadurai, Sean Lie, and Yani Ioannou for making this work possible! Cerebras Systems remains deeply committed to advancing open, reproducible research that helps move large-scale model efficiency forward for the broader community. #Cerebras #FastestInference #MachineLearning #GenerativeAI
Like Comment Share
Cerebras Systems

81,473 followers
4d
Report this post
Cerebras is now powering Cognition's latest code retrieval models directly in Windsurf Context retrieval has been one of the biggest bottlenecks in agentic coding. When you ask an agent to work on a large codebase, it can spend 60% of its time just searching for relevant files. This retrieval process not only keeps you waiting, but pollutes the context window with irrelevant snippets and racks up your inference bill. Cognition trained two specialized, compact models using RL and deployed them on Cerebras for maximum speed. Querying huge ~1M line codebases like React, Vercel, and PyTorch, swe-grep finds relevant code snippets in a few seconds, compared to minutes on Claude and Cursor. End-to-end retrieval, reasoning, and summarization runs ~5x faster than today’s leading models. Cognition’s Fast Context is available now in Windsurf Cascade. Cerebras is proud to be putting our wafer to work for every Cognition user!

3 Comments

Like Comment Share
Cerebras Systems

81,473 followers
4d
Report this post
🧮 What does 8×7B actually mean? It is NOT 8 experts with 7B active parameters per token. Turns out it’s actually 13B active parameters. But wait — where does 13B come from? In the world of Mixture of Experts (#MoE), even the simplest questions get surprisingly complex: ❓ How much storage do you actually need? ❓ How much compute does that translate to? ❓ What are the real bottlenecks — memory, compute, or communication? ⁉️ And how does Cerebras solve GPU bottlenecks? If you’ve ever tried to make sense of MoE math, our next post in the MoE 101 guide by Daria Soboleva (and interactive calculator) breaks it all down. https://lnkd.in/g7N6dh69

2 Comments

Like Comment Share
Cerebras Systems

81,473 followers
5d Edited
Report this post
The energy this week in Dubai has been electric here at GITEX GLOBAL Largest Tech & Startup Show in the World. We’ve been connecting with innovators, founders, and leaders who are shaping the next wave of AI! From global giants to fast-growing startups, one thing is clear: everyone wants to go faster in AI. 𝗖𝗲𝗿𝗲𝗯𝗿𝗮𝘀 is here to deliver with inference that is 𝟮𝟬𝗫 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗚𝗣𝗨𝘀. Thank you to everyone who stopped by our stand to meet with Andrew Feldman and our team. Your energy inspires us to keep pushing what’s possible.

Get 20× Faster Inference Access

5 Comments

Like Comment Share
Cerebras Systems

81,473 followers
6d
Report this post
Our Oklahoma City datacenter was designed around water, not air. To keep our wafer-scale systems running at peak performance, we use a Tier III, 6,000-ton chilled-water plant inside a 100,000 sq ft F5-rated facility. Unlike traditional air-cooled systems, our chilled-water design moves heat quietly and efficiently, even under the heaviest AI workloads. It keeps hundreds of millions of cores perfectly balanced as they drive real-time inference at massive scale. A closed-loop system recycles and stores water, maintaining stable cooling without drawing on external sources...even when winds might hit 318 mph. 🌪️ For AI infrastructure at this level, water-based thermal management isn’t just smart, it’s essential. It’s how we push the boundaries of speed, efficiency, and reliability every single day.
26 Comments

Like Comment Share
Cerebras Systems

81,473 followers
1w
Report this post
This is what the fastest AI inference on Earth looks like. Live from Dubai at GITEX GLOBAL Largest Tech & Startup Show in the World and Expand North Star 2025 🚀 Blazing Fast Inference takes center stage Andrew Feldman took the stage to discuss Cerebras Inference delivers breakthrough speed and scale...powering next-gen applications from code generation to agentic AI. 🎉 Packed House At the Booth Developers, enterprises, and world leaders alike are seeing firsthand what’s possible when inference runs 20x faster than GPUs. 🔥 Just Getting Started It’s only the first few days, and the energy is incredible. GITEX is where the world gathers to shape the future of AI, and Cerebras is here to help build it. Stop by Hall 3, Booth H3-B12
- +3
7 Comments

Like Comment Share
Cerebras Systems

81,473 followers
1w Edited
Report this post
Cerebras is now available via Pay Per Token It's here - the world's fastest AI inference available via self-serve. For $10, you get instant access to leading open-weight models like Qwen3 235B, GPT OSS 120B, and Qwen3 Coder 480B—running at 20x the speed of GPU-based model providers. Get started at cloud.cerebras.ai

13 Comments

Like Comment Share

Browse jobs

Funding

Cerebras Systems 9 total rounds

Last Round

Series G Oct 30, 2025

US$ 1.1B

Investors

Atreides Management Fidelity + 6 Other investors

See more info on crunchbase

Cerebras Systems

Computer Hardware

Sunnyvale, California 81,473 followers

AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.

About us

Products

Cerebras: The Fastest AI Inference

Locations

Employees at Cerebras Systems

Yvonne K Calande

Sr. Executive Assistant at Cerebras

Rob Schreiber

Distinguished Engineer at Cerebras Systems

James Lee

Startup operator

Aditya Singh

Engineer, Company builder, Venture Investor, General Partner | proud alum UIUC and Chicago Booth

Updates

Get 20× Faster Inference Access

Join now to see what you are missing

Similar pages

Groq

Tenstorrent

SambaNova

SiFive

Rivos Inc.

Astera Labs

NVIDIA

G42

Perplexity

Graphcore

Browse jobs

Engineer jobs

Intern jobs

Scientist jobs

Software Engineer jobs

Analyst jobs

Machine Learning Engineer jobs

Developer jobs

Senior Software Engineer jobs

Manager jobs

Director jobs

Associate jobs

System Software Engineer jobs

Mechanical Engineer jobs

Project Manager jobs

Product Manager jobs

Quality Engineer jobs

Program Manager jobs

Embedded Software Engineer jobs

Design Engineer jobs

Python Developer jobs

Funding