Stack L2
Cerebras Systems
Wafer-scale inference cloud at ~3,000 tok/s; $1.1B Series G at $8.1B valuation
HQ US
Fit Score
avg of 4 segmentsRecent Business Signals
Strategic Position
What they do best
Delivering frontier open models at speeds GPU clouds cannot match — gpt-oss-120B at ~3,000 tok/s (signal 0182c87d) and Llama 4 Maverick at ~2,500 tok/s verified by Artificial Analysis (signal 3d3634f0).
Their bet
That a single wafer-scale chip plus owned US data centers (signals 55dbeca3, 0248cddc) wins the latency-sensitive inference market; 2026 hires for physical design, packaging/SI and ML compilers signal a next-gen WSE tape-out.
Top risk
Inference catalog rests entirely on third-party open weights (Llama 4, Qwen3, gpt-oss); if a hyperscaler ships comparable-speed inference on these same models before Q4 2026, the speed moat erodes while the funded data-center capex is already committed.
Compared to peers
Direct competitor
Groq
Groq's LPU also chases record inference latency, but on small deterministic chips and GroqCloud; Cerebras bets on single wafer-scale silicon plus owned US data centers.
Substitute
AMD
AMD MI300/MI350 is the merchant GPU path buyers default to for both training and inference; Cerebras is inference-speed-only and cannot substitute for ROCm training fleets.
Why someone would join
- 1.Capital is in hand: $1.1B Series G at $8.1B valuation closed 2025-09-30 (signal 55dbeca3) funding concrete Oklahoma City and Minneapolis data centers (signal 0248cddc) — engineering hires are backed by deployed capex, not promises.
- 2.2026 reqs (principal packaging/signal-integrity 2026-05-06, senior physical design 2026-04-22, compiler 2026-04-02) indicate a next-gen wafer-scale bring-up; you own silicon shipping at OpenAI gpt-oss launch scale (~3,000 tok/s).
Recent Hiring (60 days)
- physical design1
- place and route1
- timing closure1
- package design1
- signal integrity1
Reverse-Hype Watch
- !Capacity target 'aggregate Llama inference capacity toward tens of millions of tokens per second' (2025-09-30) is unbacked by diversified customer signals; named demand limited to OpenAI launch-day partner and 'G42-linked international capacity' with revenue historically concentrated in G42.
- !Capacity target 'tens of millions of tokens/second' aggregate Llama inference (2025-09-30) is a funded build-out goal, but named demand signals are launches on third-party open models (Llama 4, Qwen3, gpt-oss); historical revenue 'lean heavily on G42' — scale claim outpaces evidenced customer commitment.
- !Capacity claim 'aggregate Llama inference capacity toward tens of millions of tokens per second' (2025-09-30 build-out) is a forward target; revenue/customer base 'historically concentrated in G42', self-serve diversification 'still early' — capacity not yet backed by broad independent customer-demand signals.
- !Capacity claim 'aggregate Llama inference capacity targeted in the tens of millions of tokens/second' rests on 'G42-linked international capacity'; data flags 'revenue base historically concentrated in G42' and self-serve 'diversification into self-serve products is still early' — aggregate-capacity target not backed by diversified customer signals.