Skip to main content

Inference-as-a-Service — Market

Updated 6/25/2026

Verified claims and product-axis read for Inference-as-a-Service. Every fact below is sourced; every product judgment traces back to underlying signals.


Verified facts

  • Baseten, an inference-serving platform, reached a ~$13B valuation after a ~$1.5B raise. (financial)
  • Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B valuation. (financial)
  • Together AI reached roughly ~$1B in annual recurring revenue serving open models via API. (financial)
  • Nebius's Token Factory offers managed inference; Nebius's Q1 2026 revenue grew ~684% year-over-year. (financial)
  • Cerebras, a wafer-scale inference chipmaker, completed its IPO in 2026 (~$66B day-one market cap). (financial)
  • Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and hired Groq's founder (Dec 2025). _(historical_event)_
  • Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2 in 2025 and ~1/3 in 2023 (Deloitte). (other)
  • Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not per GPU-hour. (other)
  • Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models drive inference-service economics. (other)
  • The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories in AI infrastructure. (other)

See the Products and Strategy modules for the full product list and forward-looking judgment.

Get this data as JSONLast updated: Jun 25, 2026