ASICS-ML-SYS-ENG-01

ML systems engineer evaluating merchant AI silicon (AMD / Groq / Cerebras / etc.) vs NVIDIA defaults.

Audience

  • · 5-12
  • Current: ML Systems Engineer / Distributed Training Engineer
  • Pain: ROCm vs CUDA software-stack parity gap (real status, not vendor claims)
  • Pain: Inference-cost-per-token actual numbers, not theoretical TOPS

Product Needs

(none)

Channels

(none)

Competitor Lens

(none)

Fit Score weights — adjust to your priorities

30%
20%
35%
10%
5%
Top 5 for this segment
  1. 1. Groq65/100
  2. 2. Cerebras Systems65/100
  3. 3. AMD65/100
  4. 4. Tenstorrent61/100
  5. 5. Rebellions61/100

Full Persona Brief

ML systems engineer evaluating merchant AI silicon (AMD / Groq / Cerebras / etc.) vs NVIDIA defaults.

Audience Profile

  • **Age / Experience:** 5–12 years; mid-to-senior IC.
  • **Current role:** ML Systems Engineer / Distributed Training Engineer (AI lab / hyperscaler / GPU-rich startup).
  • **Top pain points:**
  • ROCm vs CUDA software-stack parity gap (real status, not vendor claims)
  • Inference-cost-per-token actual numbers, not theoretical TOPS
  • Switching cost between accelerator families
  • **Top decision blockers:**
  • Existing CUDA codebase migration effort
  • Vendor support response time on hard bugs
  • Toolchain maturity (debuggers / profilers / multi-node)

What This Segment Needs

  • **Information:** Independent ROCm-vs-CUDA parity status, measured cost-per-token, real CUDA→target port case studies — not TOPS decks.
  • **Tools:** Mature multi-node debuggers/profilers and an MLIR/LLVM compiler with quantization-aware optimization.
  • **Services:** Vendor support with a published bug-response SLA and hands-on CUDA migration assistance.

Top 5 Companies for You (Fit Score)

| Rank | Company | Score | Why | |------|---------|-------|-----| | 1 | Groq | 81/100 | $750M at $6.9B post-money (2025-09-17, ~2.5x step-up); OpenAI gpt-oss day-one partner (2025-08-05); HF Inference Provider (5M+ devs). Deterministic software-scheduled LPU; inference-only narrows training-stack breadth. | | 2 | Cerebras Systems | 81/100 | $1.1B Series G at $8.1B (2025-09-30); Llama 4 Maverick ~2,500 tok/s, Qwen3-235B ~1,500 tok/s. Wafer-scale architecture; revenue leans heavily on G42 + CFIUS exposure. | | 3 | AMD | 81/100 | Record Q3 FY2025 ~$9.25B rev +36% YoY, data center $4.3B; OpenAI 6 GW Instinct, Oracle 50,000 MI450. Profitable (EPS $1.20) — but 5/5 reqs silicon, zero ROCm/ML-systems despite ROCm being the watch-point. | | 4 | Tenstorrent | 76/100 | Blackhole GA p150a 140 Tensix cores (2025-05-12) → Galaxy multi-chip (2025-09-18); Samsung SF2 Quasar (2025-07-22). Open-source TT-Metalium/TT-Forge; no disclosed revenue or customer wins. | | 5 | Rebellions | 76/100 | Series C ~$1.4B post-money (2025-06-12); REBEL on Samsung 4nm + SK hynix HBM3E (2025-08-26); Arm partner (2025-09-30). Staff Compiler (MLIR/LLVM) hiring; zero disclosed customer wins. |

Deal-Breakers (Your Hard Preferences)

No hard preferences declared for this segment.

How to Evaluate Any Company in this Niche (Checklist)

  • [ ] **Check growth signals:** Require ≥1 named foundation-model/hyperscaler design win with *deployed* MW or GPU counts in the last 180d — not "target" capacity (e.g. Groq Bell 7 MW live vs 500 MW target).
  • [ ] **Check comp data:** None of the 5 disclose comp — pull levels.fyi "ML Systems"/"Compiler Engineer" bands and benchmark offers against NVIDIA L5/L6 before negotiating.
  • [ ] **Check learning signals:** Count public MLIR/LLVM + ROCm/TT-Metalium commit and issue-close rates; demand a live multi-node profiler/debugger demo.
  • [ ] **Check stability signals:** Identify single-customer concentration (G42, HUMAIN/PIF ~$1.5B, OpenAI/Oracle warrant) and export-control/CFIUS exposure.
  • [ ] **Check switching cost:** Request a CUDA→target port case study with engineer-days and measured perf delta.
  • [ ] **Check culture signals:** Ask the compiler/ML-systems-to-pure-silicon req ratio and the vendor support SLA on hard kernel bugs.

Reverse-Hype Watch

  • **Targets sold as capacity:** Groq Bell 500 MW is a target (only ~7 MW live); Cerebras "tens of millions tok/s" is a funded build-out goal, not deployed.
  • **Single-customer revenue concentration:** Groq HUMAIN/PIF ~$1.5B; Cerebras G42; AMD OpenAI/Oracle 160M-share warrant.
  • **Scale/positioning unbacked by named customers:** Tenstorrent Galaxy scale-out and Rebellions up-market LLM REBEL show zero customer-win signals.
  • **Aspirational TAM as trajectory:** AMD FAD "35%+ growth" / "$1T TAM by 2030" is a management target, not booked revenue.

What's under-reported for this segment: every reasoning block says "no comp data," and not one quantifies ROCm/compiler parity, real inference cost-per-token, or vendor bug-response SLA. Hiring is ~100% RTL/physical-design — AMD explicitly zero compiler/ML-systems reqs *despite ROCm being the named watch-point*. The toolchain you'd actually live in is exactly the dimension least evidenced publicly; assume the software stack lags the silicon until proven otherwise.