GPU Cloud Pricing in 2026
From the $9/hr H100 of early 2023 to the $1.92/hr of today — what's next.
In Q2 2023 you could not get an H100 for less than $7.50/hr and you usually paid $9–11. As of this month the median neocloud price is $1.92/hr — a 78% drop. That curve isn't extrapolatable: it bends. Here's the shape we expect.
The 2023–2026 curve
| Quarter | H100 on-demand median | Driver |
|---|---|---|
| Q1 2023 | $9.40 | Supply shock, single supplier (CoreWeave + a handful) |
| Q3 2023 | $6.20 | Lambda + RunPod ramp |
| Q1 2024 | $4.10 | Spot product launches |
| Q3 2024 | $2.75 | Hyperscalers cut to defend share |
| Q1 2025 | $2.18 | H200 launches; H100 price floor visible |
| Q4 2025 | $1.99 | Post-IPO CoreWeave dumps short-tenor capacity |
| Q2 2026 | $1.92 | Asymptote forming around fully-burdened cost |
Why it stops falling
An H100 SXM5 module costs NVIDIA about $3,200 to manufacture and sells through to neoclouds for ~$30,000. The all-in deployed cost per H100 — DC space, power, networking, financing, support staff — runs $42,000–$48,000 over its life. At 95% utilisation across a 4-year depreciation schedule, that's a break-even of about $1.50–$1.65/hr. Below that, the marginal neocloud loses money on every hour.
Some neoclouds will go below break-even temporarily to defend share, but they can't stay there. So expect H100 on-demand to settle in the $1.40–$1.80 band through end of life, with reserved 1-yr at $0.85–$1.05.
Where H200 lands
H200 launched in late 2024 at $5.80/hr and has been falling on a faster curve than H100 did, because the supply ramp was much steeper and the neocloud category was already mature. Current median is $3.20. We expect a $2.30–$2.60 asymptote by Q1 2027, then a long flat tail.
Where B200 / GB200 lands
B200 launched in volume in mid-2025 at $9.40/hr. GB200 NVL72 racks rent for the equivalent of $11/GPU-hr. Both are following an H100-shaped curve compressed into ~24 months. Best-guess bottoms: B200 around $4.10, GB200 effective around $5.20.
What this means strategically
- Time-to-train per dollar is improving ~30% per year before any model-architecture gains. Plan accordingly.
- Reserved capacity is the right product if you can predict 6–12 months out. Spot is the right product if you have checkpointing.
- The marginal cost of training a 70B-parameter model is now well under $1M. The marginal cost of a 7B fine-tune is under $30K.
- Inference cost per million tokens is falling faster than training cost per GPU-hr, because batching + quantisation gains compound on top.
The two things that could break the trend
First: a true AGI demand shock that resoaks all spare capacity. Possible, not base case. Second: an energy constraint — if grid interconnect queues stretch beyond 4 years in the markets where neoclouds want to build, prices stop falling because supply stops growing. That's the realistic ceiling on this decline.