- Demand Explosion and Agentic AI
- Supply Bottleneck
- Hardware: GPUs, Memory, and the Entire Supply Chain
- Energy: A Slower but Structural Constraint
- The Desperation Signal: Space Data Centers
- The Economics: The Edgeworth Model
- The Setup
- The Process: Price Cycles, Not Equilibrium
- The Key Insight: No Equilibrium, but a Floor
- The Decision Tree: Three Scenarios for Your $20 Plan
- So, Can You Still Keep the $20 Plan?
In Part 1 [1], I argued that the $20 AI coding plan is more sustainable than most people think. Inference margins are healthy, coding agents are getting more efficient, and the market is competitive enough that no single provider can raise prices without losing users to alternatives. A contestable oligopoly. I still think that framework is right, but it's incomplete.
Since Part 1, prices have started moving. Anthropic pulled Claude Code from the $20 Pro tier. GitHub announced it's moving Copilot to usage-based billing [2] starting June 1, charging per token instead of offering flat-rate premium requests. Microsoft reportedly cancelled most of its internal Claude Code licenses [3] after the tool became so popular among employees that costs spiraled. Uber's CTO admitted the company had burned through its entire 2026 AI coding budget in four months. [4]

The Part 1 analysis focused on the demand side and competitive structure. What it didn't account for sufficiently is the supply side. Right now, supply constraints are changing the game theory in ways that matter directly for your subscription price.
To work through this properly, I want to answer three questions in sequence:
Is compute actually scarce and if so, why?
What does economic theory say happens to pricing when supply is constrained?
What scenarios does this leave us with?
Demand Explosion and Agentic AI
The demand side of this equation is staggering. Goldman Sachs recently published a report called "Decoding the Agentic Economy" projecting that agentic AI will drive a 24-fold increase in global token consumption by 2030, from current levels to roughly 120 quadrillion tokens processed per month.
The logic is straightforward: agents don't just answer a single prompt. They continuously monitor, read context, call tools, and verify results. Each task consumes orders of magnitude more tokens than a simple chatbot query.
Goldman's analyst Jim Schneider also flagged that chipmakers will likely face a shortage for the next 12 to 18 months as semiconductor manufacturers build new plants to catch up. After that, he thinks supply might start catching up. [5]
Supply Bottleneck
The supply bottleneck isn't one thing, it's at least three constraints stacked on top of each other.
Hardware: GPUs, Memory, and the Entire Supply Chain
The Economist recently reported that GPU prices have risen roughly 30% since November 2025 [6], that virtually all high-bandwidth memory (HBM) is sold out for 2026, and that $156 billion of data center investment is blocked in the U.S. alone due to local opposition. The piece also makes a broader point: hardware suppliers are underinvesting relative to the hyperscalers, with supplier capex running at only about a third of what the big cloud companies are spending.
Dylan Patel from SemiAnalysis points in the same direction. In a recent talk (around the 27-minute mark), he walks through how demand is rising for even older GPU generations like the A100, and how margins are increasing across the entire supply chain. From NVIDIA to TSMC to ASML. Even raw materials are getting scarce. [7]
Energy: A Slower but Structural Constraint
The International Energy Agency estimates that global data center electricity consumption was around 415 TWh in 2024, about 1.5% of total global demand, and projects it will more than double to 945 TWh by 2030. That would represent just under 3% of total global electricity consumption. [8]
To put that number in context: 945 TWh is roughly two-thirds of the energy that electric vehicles are projected to need more globally. The constraint here is less immediate than the hardware shortage. I won't bite in the next 12 months the way the hardware shortage does. But it operates in long-circles and might become more problematic in the long term.
The Desperation Signal: Space Data Centers
You know supply is genuinely tight when people start talking about putting data centers in orbit. SpaceX, in the run-up to its potential $1.75 trillion IPO, has been actively pitching orbital data centers as a solution to terrestrial constraints. Not sure, whether this is a real thing, but investors seem to like it. [9]
A word of caution
To be fair, supply isn't just sitting still. Inference efficiency is improving fast distillation, quantization, algoritmic improvements all reduce the FLOPs needed per token. And a lot of other chip providers than Nvidia are ramping up. The question isn't whether supply catches up eventually. It's whether it catches up before demand outpaces it again.
The Economics: The Edgeworth Model
In Part 1, I leaned on the concept of a contestable oligopoly, a market where a few players coexist, but the constant threat of competition keeps prices close to cost. That analysis implicitly assumed something important: that any firm could, in principle, serve as much demand as it wanted. If Anthropic raised prices too much, users could switch to OpenAI or an open-source alternative, and those alternatives could absorb the demand.
But what if they can't? What if everyone is capacity-constrained?
This is exactly the scenario that Francis Edgeworth worked through in 1897, and it changes the competitive dynamics in a way that matters directly for AI pricing.
The Setup
The classical Edgeworth duopoly model looks at two firms, call them A and B, that produce the same good and compete on price. The key difference from a standard Bertrand price competition: each firm has a maximum production capacity. They can't serve the entire market alone. Each faces its own demand curve, and there's a ceiling on how many units they can produce. Polynomics has a good explanation. [10]
A word of Caution
One obvious objection: Edgeworth assumes homogeneous goods, and AI models aren't identical. Claude, GPT, Gemin and so on have different strengths and ecosystems. But for the bulk of agentic coding tasks the models are close enough that users will switch on price and availability. The more commoditized the task, the closer the Edgeworth assumptions hold.
The Process: Price Cycles, Not Equilibrium
Here's what happens. Both firms start by charging a Cournot-like price (call it p): Reasonably high, splitting the market. Then firm A realizes it can steal market share by undercutting B slightly. B retaliates. Prices ratchet down as each firm tries to capture more of the market.
But, and this is the crucial part, prices don't fall all the way to marginal cost the way they would in a textbook Bertrand competition. Because both firms are capacity-constrained, there comes a point (price p') where both are producing at their maximum. At that point, neither can profitably undercut further. They're maxed out.
So what do they do? They start raising prices again. Because at p', the firms realize they're leaving money on the table. If you're already at capacity, you might as well charge more for what you can produce. Prices drift back up toward p. And then the undercutting cycle starts again.
The Key Insight: No Equilibrium, but a Floor
The Edgeworth model's central result is that there is no stable equilibrium price. Prices oscillate, sometimes higher, sometimes lowerm in what economists call Edgeworth cycles. But they oscillate within a band. They never drop as low as they would in a perfectly competitive market (marginal cost), and they never stay as high as a monopoly price.

For AI model providers, this has a directly relevant implication: supply constraints are actually protective. In a world where every major provider is capacity-constrained, no one needs to fear a competitor stealing the entire market by undercutting on price, because that competitor couldn't serve all the demand even if they tried. Each firm can calculate the other's supply and price accordingly.
This is quite different from the contestable oligopoly story in Part 1. In a contestable market, the threat of competition is enough to keep prices near cost. I won’t become monopoly-high, but meaningfully above marginal cost.
The Decision Tree: Three Scenarios for Your $20 Plan
So where does this leave us? Rather than a single prediction, I think the honest answer is a branching set of scenarios that depend on how the supply picture evolves.

Question A: Are there supply constraints on compute and energy for AI inference?
The evidence says yes for now. GPU prices up 30%, HBM sold out for 2026, $156 billion in blocked U.S. data center investment, Goldman forecasting 12–18 months of chip shortage. But supply constraints don't last forever. So the answer determines which follow-up matters.
If no → Question B: Is the inference market contestable?
Without supply constraints, we're back in Part 1 territory. The question is whether open source, low switching costs, and multiple providers keep the market competitive.
Yes (most likely) → Bertrand Oligopoly. Competition on price, margins stay reasonable, your $20 plan holds with tighter limits and more model tiering. This was Part 1's main scenario.
No → Monopoly. Someone builds a model so dominant that users won't switch. Possible in theory, but the track record so far is convergence, not separation.
If yes → Question C: Can the hardware supply be locked up?
With supply constraints, Edgeworth dynamics kick in. But the outcome depends on whether the scarce inputs, like GPUs, HBM, fab capacity can be cornered by a single player.
No → Edgeworth-Bertrand Oligopoly. Everyone is constrained, but no one is locked out. Prices cycle rather than settling. Your $20 plan survives but becomes unreliable, features appear and disappear with the supply cycle.
Yes → Monopoly. Not through a better model, but through supply chain dominance. You can build all the data centers you want. But you can't fill them with chips that don't exist.
So, Can You Still Keep the $20 Plan?
I'm less confident than I was in Part 1, but the answer isn't a simple no.
The short-term picture is clear: Usage-based pricing, tighter limits on cheap tiers, model tiering as the default. The era of "pay $20, use a frontier model with no limits" is ending, not because the economics don't work, but because demand is growing faster than anyone can build capacity.
The medium-term depends on which branch of the decision tree we're on. If supply catches up in two years, Part 1's contestable oligopoly keeps prices in check again. If supply stays tight, Edgeworth dynamics keep prices elevated but cycling, which means unpredictability more than outright price hikes.
The variable that determines which economic model applies isn't how aggressive OpenAI or Anthropic want to be on pricing. It's whether the physical infrastructure can keep up with a strong increase in demand.
In Part 1, I focused on who gets to set the price. The Part 2 update: it might not matter who wants to set the price, if there isn't enough supply to serve the market at any price.
I'd be happy for any input, perspectives, or challenges. The Edgeworth framing is a simplification, and reality will be messier. But I think it captures something the pure contestability argument missed.
AI Disclosure: The research, structuring and notetaking was done by myself, Claude wrote a draft that I improved iteratively. No links or sources were added by Claude. Image made by Gemini
