Stack L2

Cerebras Systems

Wafer-scale inference cloud at ~3,000 tok/s; $1.1B Series G at $8.1B valuation

HQ US

Fit Score

avg of 4 segments
77/100
Growth
88
Comp
Learning
85
Stability
55
Culture
64

Recent Business Signals

No business signals recorded in the last 90 days.

Strategic Position

What they do best

Delivering frontier open models at speeds GPU clouds cannot match — gpt-oss-120B at ~3,000 tok/s (signal 0182c87d) and Llama 4 Maverick at ~2,500 tok/s verified by Artificial Analysis (signal 3d3634f0).

Their bet

That a single wafer-scale chip plus owned US data centers (signals 55dbeca3, 0248cddc) wins the latency-sensitive inference market; 2026 hires for physical design, packaging/SI and ML compilers signal a next-gen WSE tape-out.

Top risk

Inference catalog rests entirely on third-party open weights (Llama 4, Qwen3, gpt-oss); if a hyperscaler ships comparable-speed inference on these same models before Q4 2026, the speed moat erodes while the funded data-center capex is already committed.

Compared to peers

Direct competitor

Groq

Groq's LPU also chases record inference latency, but on small deterministic chips and GroqCloud; Cerebras bets on single wafer-scale silicon plus owned US data centers.

Substitute

AMD

AMD MI300/MI350 is the merchant GPU path buyers default to for both training and inference; Cerebras is inference-speed-only and cannot substitute for ROCm training fleets.

Why someone would join

  1. 1.Capital is in hand: $1.1B Series G at $8.1B valuation closed 2025-09-30 (signal 55dbeca3) funding concrete Oklahoma City and Minneapolis data centers (signal 0248cddc) — engineering hires are backed by deployed capex, not promises.
  2. 2.2026 reqs (principal packaging/signal-integrity 2026-05-06, senior physical design 2026-04-22, compiler 2026-04-02) indicate a next-gen wafer-scale bring-up; you own silicon shipping at OpenAI gpt-oss launch scale (~3,000 tok/s).

Recent Hiring (60 days)

  • physical design
    1
  • place and route
    1
  • timing closure
    1
  • package design
    1
  • signal integrity
    1

Reverse-Hype Watch

  • !Capacity target 'aggregate Llama inference capacity toward tens of millions of tokens per second' (2025-09-30) is unbacked by diversified customer signals; named demand limited to OpenAI launch-day partner and 'G42-linked international capacity' with revenue historically concentrated in G42.
  • !Capacity target 'tens of millions of tokens/second' aggregate Llama inference (2025-09-30) is a funded build-out goal, but named demand signals are launches on third-party open models (Llama 4, Qwen3, gpt-oss); historical revenue 'lean heavily on G42' — scale claim outpaces evidenced customer commitment.
  • !Capacity claim 'aggregate Llama inference capacity toward tens of millions of tokens per second' (2025-09-30 build-out) is a forward target; revenue/customer base 'historically concentrated in G42', self-serve diversification 'still early' — capacity not yet backed by broad independent customer-demand signals.
  • !Capacity claim 'aggregate Llama inference capacity targeted in the tens of millions of tokens/second' rests on 'G42-linked international capacity'; data flags 'revenue base historically concentrated in G42' and self-serve 'diversification into self-serve products is still early' — aggregate-capacity target not backed by diversified customer signals.
Get this data as JSONLast updated: May 16, 2026
Frequently asked about Cerebras Systems

Get the Cerebras Systems comparison brief

The full side-by-side as a PDF, emailed now. Also free to read on this page.