Rethinking Frontier Models: The Rise of Local AI and Apple's Unified RAM Advantage

I was excited to play with Opus 4.7 this week and more than a little bit relieved that it did not in any way wow. It’s a perfectly fine model, if a bit too wordy, but it is clearly only a small step forward over its predecessor. The software engineers of the world can breath a sigh of relief that we aren’t replaceable for another few months, at least.

I think this reinforces what I’ve been reading everywhere: compute for AI is getting more and more scarce which means that training new frontier models is becoming more difficult. The scaling laws of AI dictate that to build the next gen of frontier models will take exponentially more compute, and it just doesn’t exist yet, or it contends with inference. The shuttering of Sora reinforces this theory: if it wasn’t showing ROI and its compute use was high, its value decreased compared to customer demand for other products and it wasn’t worth the resources it required to use.

So where does that lead? More data centers, obviously. But building data centers is an extremely slow process because it requires so much electricity and grid buildouts take years. So where will model training go? Gemma4 gives us a hint, I think. It’s a powerful model that I can run on my own hardware (a 3090ti with 24gb VRAM), released by Google, who has been falling behind in the frontier model quality race. I think that we’re going to see local models improve dramatically this year and they’re going to be sized for consumer hardware. Qwen3.6-35B-A3B released two days ago and fits this thesis as well. It’s just not quite there yet; prefill on my hardware is too slow for my impatience, but I think this year we’re going to see more and more model drops sized for local.

If this is true, you know who wins? Apple. They’ve been building a unified RAM architecture since the m1 release in 2020 and refining it since. Every year gets a bit better. The 128gb models are extremely appealing here. No one else is doing this and it will take several years to catch up. There’s no hint that Intel is trying. I hope that AMD enters that arena in force but their Strix Halo is a big disappointment. My conspiracy theory is that Nvidia doesn’t want to release consumer hardware with enough VRAM to be useful for local models because it could eat into their data center profits, but Apple doesn’t care about that.

The problem, of course, is chip supply. Memory chip supply is projected to be scarce for quite a year or more yet. I’m hoping the squeeze resolves in 27.

One more thing: you know who’s been killing it in local model training? China. Chew on that a little.