📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

The latest research highlights that in AI-driven software development, the model accounts for only 10% of system behavior. The real focus should be on harnessing, configuration, and context engineering, which are crucial for success.

A new whitepaper by Google researchers asserts that the AI model constitutes only about 10% of system behavior in AI-powered software development. The key to success lies in how teams configure, verify, and manage AI agents, shifting the focus from model improvements to system design and context engineering.

The paper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, emphasizes that the most significant factor in AI system performance is the harness—the prompts, tools, rules, and observability layers surrounding the model. Experiments cited in the paper show that changing only the harness, while keeping the same model, can dramatically improve outcomes, such as moving a coding agent from outside the top 30 to the top 5 on public benchmarks.

Furthermore, the authors argue that the common practice of blaming the model for failures is misguided. Instead, most issues stem from configuration errors, missing tools, vague rules, or noise in context windows. The paper advocates for a shift toward systematic context engineering, which involves carefully structuring instructions, knowledge, memory, examples, and guardrails to optimize AI behavior.

At a glance
reportWhen: published March 2026
The developmentA new Google whitepaper reveals that the core of effective AI systems lies in configuration and verification, not the AI model itself.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development and Management

This perspective fundamentally changes how organizations should invest in AI. Instead of chasing the latest model, companies should focus on building robust harnesses and context management systems. This approach offers a more cost-effective and reliable path to deploying AI at scale, reducing token waste, maintenance costs, and security vulnerabilities. It also shifts the strategic advantage toward those who excel at system configuration, rather than solely model access.

Amazon

AI system configuration tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on the Shift in AI System Design

Earlier in 2026, the AI community widely celebrated advances in large language models, but the new Google whitepaper challenges this narrative by revealing that model improvements contribute only a fraction of overall system performance. The paper builds on emerging practices like agentic engineering, where AI systems are designed with layered verification, testing, and context management, to ensure correctness and safety. This marks a significant evolution from vibe coding, which relies on minimal prompts and rapid iteration, toward disciplined, structured development processes.

“The behavior you experience is dominated by scaffolding you can build, own, and improve — not the frontier model itself.”

— Addy Osmani

Amazon

AI prompt engineering toolkit

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Remaining Questions About Practical Implementation

While the paper presents compelling evidence that harness and configuration are key, it remains unclear how quickly organizations can adopt these practices at scale. The specifics of best practices for context engineering, the cost of transitioning existing systems, and the impact on security protocols are still being explored. Additionally, the long-term effects of this shift on model development cycles are not yet fully understood.

Amazon

AI observability and monitoring software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI System Optimization

Organizations will likely begin prioritizing the development of robust harnesses, configuration management, and testing frameworks. Future research may focus on standardizing best practices for context engineering and verifying system reliability at scale. Additionally, AI vendors might offer more tools and services aimed at improving system configuration rather than just providing larger models.

Amazon

AI verification and validation tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper argues that the surrounding layers—prompts, rules, tools, and observability—are responsible for most of how an AI system behaves. The model provides the core capabilities, but the harness and context management determine how those capabilities are applied.

Does this mean model improvements are no longer important?

Model improvements remain valuable, but the whitepaper emphasizes that system configuration, verification, and harness design are more impactful for performance and reliability at scale.

How does this shift affect AI development costs?

Focusing on harness and context engineering can reduce long-term costs by lowering token waste, minimizing maintenance, and improving security, despite higher upfront investment in system design.

What practical steps should companies take now?

Companies should invest in building and refining their harnesses, develop best practices for context engineering, and implement rigorous testing and verification processes to improve AI system robustness.

Source: ThorstenMeyerAI.com

You May Also Like

The Bubble Is Not in Valuations: It’s in the Productivity Gap

New research shows AI’s productivity gains are much lower than market expectations, revealing a hidden expectation bubble that could impact valuations.

Amazon drops Sam Altman movie after announcing OpenAI partnership

Amazon cancels the release of Luca Guadagnino’s ‘Artificial’ following its partnership with OpenAI, raising questions about the film’s future and motives.

Show HN: Performative-UI – a react component library of design tropes

A new React component library called Performative-UI, featuring common design patterns, has been shared on Show HN, aiming to streamline UI development.

MAI-Code-1-Flash

Microsoft introduces MAI-Code-1-Flash, a coding model optimized for real-world developer workflows, outperforming competitors in efficiency and accuracy.