Topic

AI ASICs

Market Companies Products Hiring Personas Strategy

ASICS-ML-SYS-ENG-01

ML systems engineer evaluating merchant AI silicon (AMD / Groq / Cerebras / etc.) vs NVIDIA defaults.

Audience

· 5-12
Current: ML Systems Engineer / Distributed Training Engineer
Pain: ROCm vs CUDA software-stack parity gap (real status, not vendor claims)
Pain: Inference-cost-per-token actual numbers, not theoretical TOPS

Product Needs

(none)

Channels

(none)

Competitor Lens

(none)

Fit Score weights — adjust to your priorities

Growth30%

Comp20%

Learning35%

Stability10%

Culture5%

Top 5 for this segment

1. Groq65/100
2. Cerebras Systems65/100
3. AMD65/100
4. Tenstorrent61/100
5. Rebellions61/100

Top 5 for this segment (JS disabled — default weights)

1. Groq65/100
2. Cerebras Systems65/100
3. AMD65/100
4. Tenstorrent61/100
5. Rebellions61/100

Full Persona Brief

ML systems engineer evaluating merchant AI silicon (AMD / Groq / Cerebras / etc.) vs NVIDIA defaults.

Audience Profile

**Age / Experience:** 5–12 years; mid-to-senior IC.
**Current role:** ML Systems Engineer / Distributed Training Engineer (AI lab / hyperscaler / GPU-rich startup).
**Top pain points:**
ROCm vs CUDA software-stack parity gap (real status, not vendor claims)
Inference-cost-per-token actual numbers, not theoretical TOPS
Switching cost between accelerator families
**Top decision blockers:**
Existing CUDA codebase migration effort
Vendor support response time on hard bugs
Toolchain maturity (debuggers / profilers / multi-node)

What This Segment Needs

**Information:** Independent ROCm-vs-CUDA parity status, measured cost-per-token, real CUDA→target port case studies — not TOPS decks.
**Tools:** Mature multi-node debuggers/profilers and an MLIR/LLVM compiler with quantization-aware optimization.
**Services:** Vendor support with a published bug-response SLA and hands-on CUDA migration assistance.

Top 5 Companies for You (Fit Score)

| Rank | Company | Score | Why | |------|---------|-------|-----| | 1 | Groq | 81/100 | $750M at $6.9B post-money (2025-09-17, ~2.5x step-up); OpenAI gpt-oss day-one partner (2025-08-05); HF Inference Provider (5M+ devs). Deterministic software-scheduled LPU; inference-only narrows training-stack breadth. | | 2 | Cerebras Systems | 81/100 | $1.1B Series G at $8.1B (2025-09-30); Llama 4 Maverick ~2,500 tok/s, Qwen3-235B ~1,500 tok/s. Wafer-scale architecture; revenue leans heavily on G42 + CFIUS exposure. | | 3 | AMD | 81/100 | Record Q3 FY2025 ~$9.25B rev +36% YoY, data center $4.3B; OpenAI 6 GW Instinct, Oracle 50,000 MI450. Profitable (EPS $1.20) — but 5/5 reqs silicon, zero ROCm/ML-systems despite ROCm being the watch-point. | | 4 | Tenstorrent | 76/100 | Blackhole GA p150a 140 Tensix cores (2025-05-12) → Galaxy multi-chip (2025-09-18); Samsung SF2 Quasar (2025-07-22). Open-source TT-Metalium/TT-Forge; no disclosed revenue or customer wins. | | 5 | Rebellions | 76/100 | Series C ~$1.4B post-money (2025-06-12); REBEL on Samsung 4nm + SK hynix HBM3E (2025-08-26); Arm partner (2025-09-30). Staff Compiler (MLIR/LLVM) hiring; zero disclosed customer wins. |

Deal-Breakers (Your Hard Preferences)

No hard preferences declared for this segment.

How to Evaluate Any Company in this Niche (Checklist)

[ ] **Check growth signals:** Require ≥1 named foundation-model/hyperscaler design win with *deployed* MW or GPU counts in the last 180d — not "target" capacity (e.g. Groq Bell 7 MW live vs 500 MW target).
[ ] **Check comp data:** None of the 5 disclose comp — pull levels.fyi "ML Systems"/"Compiler Engineer" bands and benchmark offers against NVIDIA L5/L6 before negotiating.
[ ] **Check learning signals:** Count public MLIR/LLVM + ROCm/TT-Metalium commit and issue-close rates; demand a live multi-node profiler/debugger demo.
[ ] **Check stability signals:** Identify single-customer concentration (G42, HUMAIN/PIF ~$1.5B, OpenAI/Oracle warrant) and export-control/CFIUS exposure.
[ ] **Check switching cost:** Request a CUDA→target port case study with engineer-days and measured perf delta.
[ ] **Check culture signals:** Ask the compiler/ML-systems-to-pure-silicon req ratio and the vendor support SLA on hard kernel bugs.

Reverse-Hype Watch

**Targets sold as capacity:** Groq Bell 500 MW is a target (only ~7 MW live); Cerebras "tens of millions tok/s" is a funded build-out goal, not deployed.
**Single-customer revenue concentration:** Groq HUMAIN/PIF ~$1.5B; Cerebras G42; AMD OpenAI/Oracle 160M-share warrant.
**Scale/positioning unbacked by named customers:** Tenstorrent Galaxy scale-out and Rebellions up-market LLM REBEL show zero customer-win signals.
**Aspirational TAM as trajectory:** AMD FAD "35%+ growth" / "$1T TAM by 2030" is a management target, not booked revenue.

What's under-reported for this segment: every reasoning block says "no comp data," and not one quantifies ROCm/compiler parity, real inference cost-per-token, or vendor bug-response SLA. Hiring is ~100% RTL/physical-design — AMD explicitly zero compiler/ML-systems reqs *despite ROCm being the named watch-point*. The toolchain you'd actually live in is exactly the dimension least evidenced publicly; assume the software stack lags the silicon until proven otherwise.