📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google reveals that the core of AI system performance lies in harness and context, not model size. This shifts the focus from chasing larger models to optimizing configuration and verification, impacting how organizations adopt AI.

A new Google whitepaper released in early 2026 states that the model itself accounts for only about 10% of an AI system’s behavior. Instead, the harness and context engineering around the model determine its effectiveness, shifting the traditional focus from model size to configuration and verification strategies. This insight has significant implications for how organizations develop and deploy AI systems.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, emphasizes that the dominant factor in AI system performance is the harness: the prompts, tools, rules, and observability layers that surround the model. Concrete experiments cited show that changing only the harness—without altering the model—can dramatically improve performance, as evidenced by a team moving from outside the top 30 to top 5 on a benchmark.

The paper also introduces the concept of agentic engineering, where AI is integrated into structured workflows with verification, testing, and guardrails, rather than relying on vibe coding or minimal oversight. It highlights that costs are driven more by configuration and context management than by the model itself, with disciplined approaches offering better long-term economics despite higher initial investment.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper argues that the most significant factor in AI system effectiveness isn’t the model size but the harness and context engineering surrounding it.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Focus on Harness and Context Matters for AI Adoption

This shift in understanding affects how organizations should allocate resources and strategize AI development. Instead of chasing larger models, companies should invest in building robust harnesses and effective context management. This approach can lead to lower operational costs, improved reliability, and better security, making AI more sustainable and trustworthy in production environments.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI System Design and the Role of Model Size

Historically, AI development emphasized larger models, under the assumption that bigger was better. Recent trends, however, show a move toward optimizing how models are integrated and controlled. The whitepaper builds on this by demonstrating through experiments that configuration and context engineering are more impactful than model size alone. This perspective aligns with early 2026 data, where 85% of developers use AI agents regularly, and 41% of code is AI-generated, highlighting the importance of effective system design.

“The behavior you experience is dominated by scaffolding you can build and own, not the frontier model itself.”

— Addy Osmani

Amazon

AI observability and verification software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

It is not yet clear how quickly organizations will shift their focus from model size to harness and context engineering. The long-term impact on AI architecture standards and the pace of adoption remain developing areas, with some companies still prioritizing larger models due to existing infrastructure and expertise.
Amazon

AI configuration management tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Organizations and AI Developers

Organizations should evaluate their current AI workflows to identify opportunities for improving harness and context management. Developing best practices, tools, and standards for configuration and verification will be crucial. Industry-wide, a shift toward disciplined, structured AI engineering is expected to accelerate, with further research and case studies clarifying best approaches.

Amazon

AI testing and validation platforms

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is model size less important than harness and context?

The whitepaper demonstrates through experiments that the behavior of AI systems is primarily influenced by how the model is integrated, configured, and controlled, rather than the size of the model itself.

How can companies improve their AI systems based on this insight?

By investing in better harnesses—such as prompts, tools, guardrails, and observability—and developing strong verification processes, companies can optimize AI performance and reduce costs.

Does this mean larger models are obsolete?

Not necessarily; larger models still have value, but the whitepaper suggests that their impact is limited compared to the configuration and management layers surrounding them.

What are the risks of focusing less on model size?

The main risk is underestimating the importance of proper system design, which could lead to unreliable or insecure AI deployments if harness and context are neglected.

When will this shift in focus become industry standard?

The whitepaper indicates this is an emerging perspective, with early adopters already experimenting. Widespread industry adoption will depend on further validation and development of best practices.

Source: ThorstenMeyerAI.com

You May Also Like

The citation. Why generative engine optimization rewards the same brand on the least stable ground.

GEO favors established brands in AI citations, risking concentration and instability in search dynamics. What this means for publishers and marketers.

Blue Origin cleared to fly New Glenn mega-rocket after April mishap

FAA clearance allows Blue Origin to resume New Glenn launches following a thermal anomaly and payload failure in April, with plans for up to 12 launches this year.

The Skills Marketplace, Six Months Later: Predicted vs Actual

Six months after predictions, the skills marketplace has grown to over 4,200 skills with significant structural changes and fragmentation, impacting creators and platforms.

.NET (OK, C#) gets union types

Microsoft adds union types support in C# 15 with .NET 11 preview, enabling more expressive type handling and functional programming patterns.