The end of isotropy and the rise of metadynamic AI
I’ve spent the better part of this weekend putting OpenAI’s latest offerings through their paces - both the newly released open-weight models and GPT-5 itself. Armed with a selection of coding challenges, mathematical problems, and the sort of esoteric research queries that usually separate the wheat from the chaff, I’ve been conducting what amounts to a weekend-long torture test of these systems.
The results are fascinating, frustrating, and thoroughly illuminating in ways that the marketing materials certainly didn’t prepare me for.
GPT-5 solved a complex epidemiological modelling problem I threw at it with remarkable sophistication, generating code that was not only functional but elegantly structured.1 Twenty minutes later, it stumbled over a basic combinatorics question that my undergraduate students would handle without breaking stride. As AI scientists, we’re used to these inconsistencies (starting with Moravec’s paradox), but this was beyond the usual: brilliant flashes of insight punctuated by inexplicable failures on seemingly simple tasks.
1 Even by my standards. I am not a stellar programmer, but I like my code clean.
This isn’t your typical post-launch grumbling from disappointed users (though there’s been plenty of that too). We’re witnessing something far more significant: the end of what I call “isotropic growth” in artificial intelligence, and the beginning of something infinitely more complex.
The new shape of progress
Most of AI development until now was what I’d call isotropic: each new model generation we encountered has generally improved across virtually every metric simultaneously – better reasoning, better coding, better writing, better safety, all at once. GPT-4 was superior to GPT-3.5 in nearly every conceivable way. Claude Sonnet and Opus improved upon its predecessors across the board. Progress was predictable, linear and comfortable.
If the public outcry and demands to be reunited with 4o reflects anything, it’s that GPT-5 represents the end of this era. For the first time, we have a major model that exhibits anisotropic progress, vastly improving in some areas coupled with stagnation or even regression in others. On coding benchmarks like SWE-bench, it scores 74.9%, barely edging out Anthropic’s Claude Opus 4.1 at 74.5%. Yet on agentic tasks involving airline website navigation, it actually underperforms OpenAI’s own o3 model, scoring 63.5% versus 64.8%. Users reported struggles with basic tasks like counting letters, with GPT-5 initially saying “blueberry” contains “three” instances of the letter “b”.
GPT-5 represents the end of this era. For the first time, we have a major model that exhibits “anisotropic progress” - dramatic improvements in some areas coupled with stagnation or even regression in others. Some data points:
- On coding benchmarks like SWE-bench, it scores 74.9%, barely edging out Anthropic’s Claude Opus 4.1 at 74.5%.
- On τ-bench’s Airline task involving website navigation, it actually underperformed OpenAI’s own o4-mini – more concerningly, it uses tons more tokens than the competition.2
- Users reported struggles with basic tasks like counting letters, with GPT-5 initially saying “blueberry” contained three instances of the letter “b”. AI and fruits just don’t seem to mix.
2 Interesting rabbit hole: according to OpenAI’s own prompting guide, using the Responses API vs. Chat Completions might actually improve performance.
Within hours of GPT-5’s release, Reddit was flooded with criticism. A thread titled “GPT-5 is horrible” garnered nearly 6,000 upvotes and over 2,100 comments. Users complained of “short replies that are insufficient, more obnoxious AI-stylised talking, less ‘personality’ and way less prompts allowed.” One particularly cutting comment captured the mood: “Combine that with more restrictive usage, and it feels like a downgrade branded as the new hotness.” Far be it from me to take Reddit too seriously, but this is definitely a signal.
And yet others have rightly lauded it as a very good model. It does great at coding for the price. It’s fast. This divergence is evidence that we’ve reached an inflection point where different users, with different needs and expectations, are experiencing genuinely different value propositions from the same system. This is precisely what you’d expect when uniform progress ends and we enter an era of specialised, uneven advancement.
What we’re democratising
What makes this moment particularly delicious is the exquisite irony embedded within OpenAI’s recent proclamations about democratisation. Just days before GPT-5’s release, OpenAI published their first open-weight models since GPT-2, accompanied by grand rhetoric about “putting AI in the hands of as many people as possible” and building “democratic AI rails.” The fact that this was also an enormous powerplay, both on the micro (corporate) scale through hardcoding Harmony into gpt-oss, and on the geopolitical macro scale through tying it in with America’s mission to take the lead in the global AI race just when Project Stargate desperately needed some good news, doesn’t detract from the magnitude of the achievement and the potential it unleashes.
The open source models were the appetiser. GPT-5 is the main course, and it’s a dish that requires everyone to become a chef. With GPT-5’s mixed performance and its multiple variants (gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro), each optimised for different use cases, with a “real-time router” that decides which model to deploy, OpenAI has forced every user into the same orchestration challenges that once plagued only the most sophisticated AI teams. We’re all architects now.
When users complained about inconsistent performance and demanded access to previous models, they were essentially asking for the right to choose their own orchestration strategy. They wanted to go back to the days when they could simply pick the best available model rather than trust an opaque routing system to make that choice for them. What OpenAI has democratised is the requirement to become a systems architect. Every developer, every startup, every end user now faces the fundamental challenge that once confronted only the kind of sophisticated AI teams my colleagues and I run, mainly serving large enterprises. If you want optimal performance, you can no longer rely on a single monolithic ‘best’/SOTA model. You must learn to coordinate multiple AI agents, route queries appropriately, and choreograph systems of models. Suddenly, OpenAI brought agentic AI into America’s living rooms.
This is what democratisation looks like in the post-isotropic era: not just giving everyone access to the same powerful tool, but also to the same complex problems. Welcome to our headaches
Accelerate, accelerate, accelerate.
But perhaps there’s something far more calculated happening here than mere technological evolution. Consider this possibility: isotropic growth is not sustainable. The line can’t always go up. What if GPT-5 – its strengths and weaknesses – is a finely tuned response to this?
Look at where GPT-5 truly excels. OpenAI explicitly positions it as “our best model yet for coding and agentic tasks,” with companies like Cursor praising its “half the tool calling error rate over other frontier models” and its ability to “reliably chain together dozens of tool calls—both in sequence and in parallel—without losing its way.” The model’s real strength isn’t in being uniformly better at everything – it’s in being an exceptional agent driver.
I don’t think this is accidental.3 GPT-5’s mixed performance in individual domains is an epiphenomenon. Its strength in being a cheap, fast and efficient agent runner, a kind of agentic universal backbone, is what I think is the target.
3 By way of disclaimer: I am privy to an awful amount of inside baseball on the AI industry, but I am not ever concerned with this kind of corporate strategy. I do science, not politics – to the point that I’m somewhat notorious for excusing myself as soon as this kind of talk starts. This is entirely conjectural.
And if that’s the case, it’s evidence of strategic brilliance. Isotropic growth was always doomed – and sometimes the way to survive the impending sea-change is to just bring it about. That way, you at least have some control over the situation. GPT-5’s excellence at agent coordination creates a perfect market dynamic: it compels users toward agentic architectures while positioning OpenAI as the essential purveyor of those services. OpenAI might just have pulled off the biggest pivot in AI history: from provider of solutions competing with other models to becoming the indispensable coordination layer that none can compete with. They’ve understood isotropic growth is becoming obsolete, and adopted a strategy of going full-on accelerationist while also selling the solution for the post-isotropic world.
The open-source models echo this too, “designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities”. Together, they paint a new image of OpenAI’s focus as makers of agent-drivers and orchestrators. It’s not the prospectors who got rich during the gold rush. It’s the guy who sold them the picks. He, after all, didn’t have to get lucky.
Speciating AI
This transition from monolithic models to model ecosystems mirrors evolutionary biology in fascinating ways. We’re witnessing the emergence of AI “species” – specialised variants optimised for specific niches rather than generalist organisms trying to do everything adequately. We’ve always had some semblance of these in LoRAs, finetunes and adapters, but those are like someone taking a postgraduate degree – these are like being bred for a job.
My weekend experiments made this abundantly clear. The gpt-oss-20b model, despite being significantly smaller than GPT-5, actually outperformed it on certain mathematical reasoning tasks. Meanwhile, GPT-5 excelled at generating front-end code with minimal prompting but struggled with the sort of systematic debugging that its predecessors handled gracefully. We’re not dealing with a linear progression anymore – we’re dealing with adaptive radiation a la Galapagos finches.
The implications extend far beyond individual model performance. We’re transitioning from an era where progress meant building bigger, more capable individual models to one where progress means building better systems for coordinating multiple specialised models. And who will do that is still up in the air. Just when everybody thought GPT-5 will “kill all the ChatGPT wrappers”, it’s given their purveyors a new task – sell not a better prompt with a frontend but a compositional architecture of multiple agents. There’s plenty of room to excel (or fail!) there.
We’re all architects now
This shift has profound implications that extend far beyond the AI research community. For the past few years, those of us working as AI systems architects and computational scientists have been grappling with precisely these orchestration challenges for our clients. When should we route a query to a reasoning model versus a fast-response model? How do we coordinate multiple AI agents to tackle complex, multi-step problems? When does it make sense to ensemble different models for higher reliability? These were specialist concerns, the domain of AI consultants and research teams building bespoke solutions for enterprises with deep pockets and sophisticated technical requirements. I have spent most of the last decade running teams like that, for a demanding clientele with deep pockets who needed tackling that complexity to wring out the last bit of performance. Most users, though, could simply ask for “the best model” and get a straightforward answer: use GPT-4, or Claude, or whatever sat atop the latest benchmarks that week.
That era is over.
What GPT-5’s mixed performance signals is that there will no longer be a single “strongest” or “best” model that users can rely on. And if there’s no single best model, then we will have to create superior configurations of models where the strength of one compensates for the weaknesses of the other. This complexity won’t be limited to high-end technical users. It won’t be limited to anybody, really – it will be a problem that everybody, including the absolute end user, will have to deal with.
OpenAI hasn’t just democratised the power of AI – they’ve democratised all the headaches that come with having to orchestrate it.
The metadynamic future
Whether intentional or not, the implications remain the same. The age of AI soloists is ending, and the future belongs not to those who can prompt the best individual model, but to those who can choreograph the most elegant dance between many. We’re witnessing the birth of what I call “metadynamic AI” – intelligence that emerges not from any single model but from the sophisticated, adaptive orchestration of multiple AI agents working in concert in a larger ecosystem. This goes beyond standard agentic AI wisdom, and is about creating the wider ambit of the system (complete with tools, tasks, contexts). Most agents are still essentially souped-up chatbots with tool access, fundamentally monolithic: one model, one reasoning process, one set of capabilities enhanced by external tools.
Metadynamic AI is qualitatively, perspectivally different. Instead of one agent using tools, we have multiple AI agents with different specialisations collaborating, competing and complementing each other, based often enough on a vastly diverging understanding of the universe and its probability distributions based on their underlying models, contexts and environs. Instead of a single reasoning chain, we have multiple reasoning processes that can be dynamically routed, combined and orchestrated based on the task at hand. And with the ability to differentiate and speciate to task, we have emergent intelligence that arises from the interaction between multiple AI systems.
Simple agentic AI asks ‘how can I help this model use tools better?’ Metadynamic AI asks ‘how can I coordinate multiple models to achieve what none of them could accomplish alone?’ Agents are tool-users. Metadynamic AI is not about designing better tool-users but ecosystems that involve tools, but also marketplaces, villages, hills, gradients, slopes, seas and moats. It’s world-building writ large for agents to populate.
The world of metadynamic AI is simultaneously more powerful and more fragile, more capable and more complex. It demands new skills, new ways of thinking and new approaches to problem-solving. But it also opens up possibilities that were simply impossible in either the monolithic or simple agentic eras. In case you haven’t had enough paradoxes for the day: metadynamic AI is both more accessible and vastly more complex. Yes, you can now download and run powerful models on your laptop. But to extract maximum value from these systems, you must now master the dark arts of agent orchestration, model routing and multi-system coordination, skills that until recently were limited to specialist teams like mine, and a clientele that could afford it. Now, it’s going to be for eveyone. We’ve traded the inefficient simplicity of a monolith for the exaptive, complex worlds we’ll have to build for our agents to live in.
We’re entering an era where there won’t be a single “best” AI model, just as there isn’t a single “best” species in nature. Instead, we’ll have AI ecosystems where different models excel in different domains, and the art will be in knowing how to orchestrate them effectively. This represents a fundamental shift in how we think about AI capabilities – and about ourselves. The question is no longer “which model is best?” but “how do I coordinate these models to achieve what no single model can accomplish alone?” The winners won’t be those who build the most powerful individual models, but those who master the art of coordination. The need this world has for competent conductors of agentic orchestras will, I think, far eclipse the jobs “lost to AI”.4
4 I am generally sceptical. I know enough about AI to know if whatever progress the last few years have wrought can eclipse what your employer thinks you’re bringing to the table, at least one of you is woefully incompetent.
For those still thinking in terms of monolithic intelligence, this transition will feel disorienting, even disappointing. GPT-5’s mixed reception reflects this confusion: users of ‘the old dispensation’ expecting uniform improvements across all dimensions were discomforted by a system that’s an awful lot like my Golden Retriever, Oliver – brilliant in some ways and oh so frustrating in others. But for those who adapt to the new paradigm, the possibilities are far richer than anything we could achieve with even the most powerful individual model. What F. E. Smith said about the world continuing to offer glittering prizes to those with sharp swords and stout hearts will continue to hold true in the metadynamic arena for those willing to embrace what’s coming.
Those who master it will reap those prizes, and then some. Those who don’t, or not soon enough, will be hopelessly outmanoeuvred before they even have a chance to understand just how much the rules of the game have changed these last few days.
Citation
@online{csefalvay2025,
author = {{Chris von Csefalvay}},
title = {The End of Isotropy and the Rise of Metadynamic {AI}},
date = {2025-08-10},
url = {https://chrisvoncsefalvay.com/posts/metadynamic-ai/},
langid = {en-GB}
}