AI ASICs — Product Landscape (May 2026)

Updated 5/16/2026

AI ASICs Product Landscape — May 2026

Custom AI silicon competing at hyperscaler scale splits along one technical axis: **where the model's working set lives** — HBM, on-die SRAM/wafer, or a tiered hierarchy — and **how open the programming stack is**. The defining 2025 shift: open-weight frontier models (OpenAI gpt-oss-120B/20B, Aug 2025; Llama 4 Maverick 400B; Qwen3-235B/480B) became strong enough to serve in production, decoupling competitive silicon from owning model IP or CUDA. Every vendor below is positioned against that shift.

Product Categories

  • **HBM-capacity accelerators**: GPU-class parts where per-node HBM is the binding constraint for frontier training/inference — AMD MI350, 288GB HBM3E (c61e3192).
  • **SRAM / wafer-scale deterministic-dataflow inference silicon**: throughput leadership on open/mid models via on-chip memory, weak on long-context HBM economics — Groq LPU (1c3cd794), Cerebras WSE (9eb54891).
  • **Three-tier reconfigurable-memory dataflow**: tiered memory to fit trillion-parameter / long-context models on fewer nodes — SambaNova SN40L (ed15bcc2).
  • **Open-programmable RISC-V / chiplet challengers**: open stack or sovereign supply as the lever, not raw throughput — Tenstorrent Blackhole (90e1b440), Rebellions REBEL/ATOM (fedc5c91).

Comparison Table

| Product | Company | Headline Hardware Spec | External Validation Signal | Use Case | |---|---|---|---|---| | MI350 (CDNA 4) | AMD | 288GB HBM3E, ~4x gen AI-compute uplift | OpenAI + Oracle named commitments (c61e3192) | Memory-bound frontier training/inference per node | | LPU / GroqCloud | Groq | ~500 tok/s-class, SRAM-centric | 4 named production engagements (1c3cd794) | High-volume low-latency open/mid-model serving | | Cerebras Inference (WSE) | Cerebras Systems | >2,500 tok/s on Llama 4 Maverick 400B | Artificial Analysis 3rd-party benchmark; OpenAI launch-day (9eb54891, dca01ec0) | Speed-critical open frontier-model inference | | Blackhole | Tenstorrent | 140 Tensix cores, 32GB GDDR6, on-card RISC-V | GA May 2025 on schedule; no named hyperscaler wins (90e1b440, a9f64a08) | Open-stack scale-out where ISA/stack control matters | | SN40L | SambaNova Systems | 3-tier memory, trillion-param / long-context | Llama 3.1 405B on SambaNova Cloud; no named wins (ed15bcc2, 2c945945) | Long-context / trillion-param inference on fewer nodes | | REBEL / ATOM | Rebellions | ATOM ~128 INT8 TOPS; REBEL Samsung 4nm + HBM3E | Arm Total Design / Neoverse CSS backing (fedc5c91, d12eda11) | Sovereign, power-efficient inference |

Differentiation Map

  • **Best for memory-bound frontier training (200B+ params per node)**: AMD MI350 — 288GB HBM3E exceeds contemporaneous NVIDIA flagship, cited by OpenAI/Oracle (c61e3192).
  • **Best for lowest-latency open-model serving (Llama/Qwen/gpt-oss)**: Cerebras WSE — >2,500 tok/s on a 400B model, independently benchmarked (9eb54891).
  • **Best for token-cheap high-fan-out inference APIs**: Groq LPU — four independent named production partners selected it (1c3cd794).
  • **Best for sovereign / supply-chain-controlled deployment**: Rebellions REBEL — Samsung 4nm + SK hynix HBM3E, Arm-backed chiplet (fedc5c91, d12eda11).
  • **Avoid if you need a drop-in CUDA-replacement training stack today**: Tenstorrent Blackhole and AMD ROCm — software maturity is the explicit unclosed gap (a9f64a08, c930c5a3).
  • **Avoid if procurement mandates named hyperscaler reference customers**: SambaNova SN40L — zero corroborated customer-cited wins in the dataset (ed15bcc2, 8eb63d12).

Tool: Interactive Comparator

[Link to /tools/ai-asics-compare] — spec: filter by memory architecture (HBM / SRAM / tiered / open-RISC-V), pick 2 spec axes (headline throughput, memory capacity, named-engagement count), render side-by-side with data_point provenance. Wired in Phase G.

Reverse-Hype Warnings

Overpromised: AMD's MI350 silicon is best-on-paper (288GB HBM3E, ~4x uplift, c61e3192) but ROCm is the explicit watch-point and the reason it is not yet a clean CUDA substitute (c930c5a3) — the hardware headline outruns the deployable stack. Cerebras and Groq speed records are real but earned on open/mid models (gpt-oss, Llama, Qwen), not proprietary frontier IP; Groq's SRAM-centric LPU has unproven long-context and frontier-scale economics versus HBM-rich parts (c5c90b31), and Cerebras' catalog is third-party open models, so the moat is speed, not model IP (c94581d5). SambaNova's 3-tier memory design is architecturally credible but has zero named hyperscaler engagements corroborating it in the supplied data (ed15bcc2, 8eb63d12) — treat the trillion-parameter framing as a capability claim, not a deployment record. Underrated: Tenstorrent's on-card RISC-V open programmability is a genuine architectural lever against closed TPU/Trainium/Maia that its sub-60 quality anchor understates (90e1b440); Rebellions' Arm Total Design / Neoverse CSS backing is real platform validation, even though it is ecosystem endorsement rather than shipped customer proof (d12eda11).

Get this data as JSONLast updated: May 16, 2026