Skip to content

Bruce Hart

AI LLMs Opinion cost-optimization

The AI Race Is Starting to Look Like Logistics

Portrait of Bruce Hart Bruce Hart
7 min read

The AI race is starting to look less like benchmark combat and more like logistics.

I still think the latest Opus has a slight edge over GPT-5.4 on raw feel.

Sometimes it feels a little sharper. A little more tasteful. A little more willing to stay with the hard parts of a problem.

But if you made me pick one ecosystem to live in every day right now, I would still pick OpenAI.

Not because the model always wins the prompt-level beauty contest. Because the real contest is shifting underneath us.

It is becoming less about who can win an isolated battle on intelligence, and more about who can keep the whole machine running at scale.

Anthropic's recent limit changes are a tell

Anthropic's March 2026 usage promotion was framed as a temporary perk: double your five-hour usage outside peak weekday hours from March 13 through March 28.

That now looks like the soft version of the same idea.

As Theo Browne pointed out in his video on the change, Anthropic has now gone further: during weekday peak hours, users move through their five-hour session limits faster than before. In other words, this is not just an off-peak bonus anymore. It is an explicit peak-time cutback.

That is a useful clarification because it makes the underlying issue much more obvious. When a lab starts nudging usage away from certain hours, that is traffic engineering. When it starts making your limit burn faster during those hours, that is capacity rationing.

Layer that on top of Anthropic's public status page, which has shown a steady flow of recent incidents, and the picture looks pretty clear to me: demand is running hard into capacity constraints.

This is not me dunking on Anthropic. If anything, it is a compliment. People want the product. A lot. But demand is only half the story. The other half is whether you can serve it reliably when everyone shows up at once.

Theo's framing is basically right: this is a GPU allocation problem

The most convincing part of Theo Browne's analysis is that this does not look like simple greed. It looks like a company that subsidized usage very aggressively, then ran into the harder reality that peak-hour GPUs have competing claimants.

Research wants compute. Product teams want compute. Enterprise customers paying real money for daytime reliability want compute. Power users on heavily subsidized subscriptions want compute too.

When all of those groups want the same GPUs in the same weekday window, somebody gets squeezed.

Anthropic's own framing, as Theo summarized it, is that weekly limits are staying the same while the distribution changes across the week. That is a very specific kind of concession. It suggests the problem is not total long-run usage so much as peak-hour congestion.

That is why this matters beyond one company. The industry is getting pushed out of the clean world of headline model quality and into the messier world of queueing, priority, and resource allocation.

A model that is slightly better but less available is not actually winning

This is the part people still underrate.

I do think Opus is slightly better than GPT-5.4 in the narrow, artisanal sense. If I am judging pure output quality on a great day, I often give Opus the nod.

But users do not experience models as abstract benchmark entries. They experience them as systems.

Latency matters. Rate limits matter. Whether the service is up during the middle of the workday matters. Whether the product feels generous or claustrophobic matters.

Wars are not won by heroic battles alone. They are won by logistics.

AI is starting to feel the same way. A model that wins a few quality skirmishes but loses the weekday traffic jam is not obviously ahead. Availability is part of quality. So is cost. So is how much useful work you can get done before the system tells you to come back later.

Efficiency is becoming the real moat

From the outside, OpenAI looks better at the ugly but decisive stuff.

They seem more proactive about getting compute. They seem more willing to make hard product calls around where that compute goes. And they seem very focused on extracting more useful work out of the same underlying models.

A lot of that progress does not show up as a dramatic new capability name. It shows up as systems work.

Better memory handling. Better context management. Better infrastructure decisions. Better product packaging around the model you already have.

That matters more than people think.

The magic feeling of almost infinite context is usually not literally infinite anything. It is what happens when a company gets good at compaction, retrieval, routing, and all the surrounding engineering that makes a model feel bigger than its raw spec sheet.

Not model magic alone, but systems magic around the model.

Focus is a weapon too

OpenAI has already said Sora 1 will no longer be available in the US starting March 13, 2026, and that the broader Sora web and app experience will be discontinued on April 26, 2026, with the API following on September 24, 2026.

OpenAI's public explanation is about reducing complexity and improving the unified Sora 2 experience, not about some secret project absorbing GPUs. That distinction matters, and I do not want to pretend otherwise.

Still, the broader point stands.

A lab that is willing to shut down older surfaces, consolidate infrastructure, and concentrate attention has an advantage. Focus is a resource allocation strategy.

I do not know how real the rumored Spud timeline is. Maybe it lands soon. Maybe it slips. Maybe the name changes. Maybe half the chatter is wrong.

But if there is a major new OpenAI model coming, then cutting peripheral complexity and freeing up talent and compute is exactly the kind of disciplined move I would want to see.

The funny part is that GPT-5.4 is already enough

The part that keeps making me smile is that I am excited for what comes next, while also feeling weirdly content with what we already have.

If progress paused and I just had GPT-5.4 for a long time, I would be fine. More than fine, honestly. The current generation is already good enough to change how I code, write, and think.

That is why the next jump feels so interesting.

Not because today's models are disappointing. The opposite. They are already absurdly useful. So if the rumored next step really does feel like an o1-to-GPT-5.4 kind of jump, then spring and summer are going to be a lot of fun.

But even if that part takes longer than people hope, I think the lesson is already here.

The AI race will not be won only by the lab with the prettiest demo or the single smartest output on a lucky prompt. It will be won by the lab that can secure the chips, run the systems efficiently, keep the service up, and make ruthless enough product decisions to stay focused.

That is starting to look a lot like logistics.

If you think I am underrating raw model quality here, I would genuinely love to hear the argument. My guess is that 2026 is going to teach us just how much of frontier AI competition is really operations and maximizing the use of GPU resources.