Research scientist running benchmarks on merchant AI accelerators for publication / internal eval.
Audience Profile
- Age / Experience: 4–12 years experience
- Current role: Research Scientist / Senior ML Researcher / Eval Engineer
- Top pain points:
- Benchmark suite divergence across accelerator vendors
- MLPerf timeliness vs production reality lag
- Access to non-NVIDIA hardware for reproduction
- Top decision blockers:
- Cluster access lottery + research budget
- Publication norms on vendor-supplied numbers
What This Segment Needs
- Information: Reproducible methodology disclosure (batch size, seq length, precision, warm-up) behind every vendor tok/s claim; independent third-party MLPerf submissions, not vendor-run numbers.
- Tools: Open compiler/runtime stacks (ROCm, TT-Metalium, MLIR) and a public reference harness that re-runs vendor configs end-to-end.
- Services: Cloud or loaner access to non-NVIDIA bare metal without a cluster-allocation lottery, plus permission to publish numbers that contradict vendor submissions.
Top 5 Companies for You (Fit Score)
| Rank | Company | Score | Why | |------|---------|-------|-----| | 1 | Cerebras Systems | 82/100 | $1.1B Series G at $8.1B val (2025-09-30); 4 products shipped May–Aug 2025 incl. Llama 4 Maverick 2,500 tok/s, gpt-oss-120B. Unique wafer-scale reproduction target, but capacity is a forward goal and revenue is G42-concentrated. | | 2 | AMD | 81/100 | Q3 FY2025 ~$9.25B rev, +36% YoY; data center $4.3B. OpenAI 6GW Instinct (2025-10-06), Oracle 50k MI450 (2025-10-14). Profitable; ROCm is a real non-NVIDIA reproduction path. Export charge + two-customer ramp risk. | | 3 | Groq | 81/100 | $750M at $6.9B post (2025-09-17); OpenAI gpt-oss day-one (2025-08-05); HF Inference Provider exposes 5M+ devs (2025-06-16). Deterministic LPU eases reproducible benchmarks; inference-only, no revenue disclosed. | | 4 | Tenstorrent | 76/100 | Blackhole GA p150a (140 Tensix / 32GB, 2025-05-12); Galaxy + TT-QuietBox (2025-09-18); Samsung SF2 2nm (2025-07-22). Open-source TT-Metalium aids reproduction; zero customer/revenue signals. | | 5 | Rebellions | 76/100 | Series C ~$1.4B post (2025-06-12); REBEL on Samsung 4nm + SK hynix HBM3E (2025-08-26); Arm partnership (2025-09-30). Full MLIR/LLVM stack; no customer wins, HBM/4nm allocation behind NVIDIA. |
Deal-Breakers (Your Hard Preferences)
No hard preferences declared for this segment.
How to Evaluate Any Company in this Niche (Checklist)
- [ ] Check growth signals: require ≥3 independent datapoints in 12 months (shipped SKU + named customer + dated raise), not roadmap-only capacity targets.
- [ ] Check comp data: none of these vendors expose comp — pull levels.fyi / Blind bands for hardware DV and ML-systems IC before negotiating.
- [ ] Check learning signals: confirm a public, reproducible stack (ROCm, TT-Metalium, MLIR) you can re-run, not vendor-only numbers.
- [ ] Check stability signals: identify single-customer concentration (G42, HUMAIN, OpenAI/Oracle) and compute runway from the last raise date.
- [ ] Check reproduction access: verify non-NVIDIA silicon is reachable via cloud/loaner without a cluster lottery before committing an eval.
- [ ] Check culture signals: ask in interview whether you may publish independent numbers that contradict the vendor's MLPerf submission.
Reverse-Hype Watch
- Cerebras: "tens of millions of tokens/s" is a forward build-out target; revenue G42-concentrated, self-serve still early.
- AMD: FAD "35%+ growth over 3–5 years" against a vendor-cited ~$1T TAM is management aspiration; 5/5 sampled reqs are silicon, zero ROCm/ML-systems hires.
- Groq: Bell AI Fabric 500 MW target vs only ~7 MW live (Kamloops, 2025-05-28) — ~70x gap.
- Tenstorrent: "alternative to hyperscaler custom silicon" with no customer win, unit-shipment, or revenue signal.
- Rebellions: ~$1.4B valuation is investor-driven, not earnings-backed; runway depends on continued raises.
For this segment the under-reported dimension is methodological reproducibility, not money. Coverage fixates on funding and headline tok/s, while the data a benchmark scientist actually needs — fixed batch size, sequence length, precision, warm-up, and whether independent third-party MLPerf submissions exist alongside vendor-supplied ones — is almost never public. Equally absent: whether non-NVIDIA bare metal is reachable for reproduction without a cluster-allocation lottery.