TL;DR

Thorsten Meyer AI published a field note arguing that open AI models are not free to run, even when their weights cost nothing to download. The piece says self-hosting can beat paid APIs for steady, high-volume workloads, while APIs remain stronger for low or uneven use.

Thorsten Meyer AI has published a follow-up field note arguing that companies should judge open-weight AI models by total cost of ownership, not by the fact that the model files can be downloaded at no charge, a distinction that matters as buyers weigh self-hosted systems against paid AI APIs.

The piece frames the central issue as the “free-download question”: why pay a vendor to run models on-premises if open models such as Qwen can be downloaded for free. Its answer is that only the weights are free. Hardware, electricity, operations time, model updates, queue management, system reliability, and depreciation remain real costs.

According to the field note, the economic case depends on usage pattern. For low or uneven demand, hosted APIs can remain the cheaper and simpler choice. For steady, high-volume workloads, owned inference hardware may become cheaper because usage is no longer billed token by token after the infrastructure is in place.

The author describes a crossover point rather than a universal answer. In one illustrative model cited in the source material, break-even appears near about 80 million tokens per month under selected assumptions. The source presents that as an example of cost shape, not as a vendor quote or fixed market price.

Why It Matters

The analysis matters because many AI buyers are now comparing three costs at once: paid frontier APIs, hosted open models, and self-run open models. A model that is free to download may still be expensive to operate poorly, while a paid API may become expensive when usage becomes large and predictable.

The piece also speaks to the European AI sovereignty debate. If data never leaves a company’s own machines, sovereignty is built into the deployment rather than promised through a contract. But the source says that benefit comes with operational responsibility: the company owns outages, tuning, scaling, and maintenance.

The finding is most relevant for publishers, software teams, research groups, and enterprises with repeated inference workloads. It is less clear for organizations that need frontier quality on difficult tasks, do not have operations staff, or face demand spikes that make owned hardware sit idle between bursts.

Local LLM Inference Optimization: A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

Local LLM Inference Optimization: A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

The field note follows an earlier Thorsten Meyer AI piece about Mistral and the European sovereignty pitch. The new article addresses a challenge left open there: whether companies should pay for managed or on-prem AI services when open models from Chinese labs and others are available to download.

The source says the capability gap between closed frontier models and open-weight systems has narrowed, though it still describes closed systems as ahead on the hardest long-horizon agentic tasks. It also says open systems often trail the frontier by six to 12 months before catching up on prior benchmark challenges.

The article cites Apple Silicon memory capacity, mixture-of-experts models, and small local fleets as factors changing the economics of self-hosting. It says a 192GB Mac Studio can hold a 70B model in memory, while some mixture-of-experts designs require fewer active parameters during inference.

“The weights are free to download. Running them well is not.”

— Thorsten Meyer AI field note

“The honest comparison is total cost of ownership vs. per-token API.”

— Thorsten Meyer AI field note

“The crossover zone is real — and growing.”

— Thorsten Meyer AI field note

Personal AI Servers: A Guide to Building Private AI Infrastructure for Secure, Offline and Self-Hosted Local LLMs for Data Privacy

Personal AI Servers: A Guide to Building Private AI Infrastructure for Secure, Offline and Self-Hosted Local LLMs for Data Privacy

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

Several details remain workload-dependent. The source does not establish a universal break-even point, and its cited threshold depends on model choice, hardware cost, power prices, staff time, throughput, latency needs, and quality requirements. It is also unclear how long current open-model price and capability gaps will hold, since model releases and API pricing can change quickly.

The NVIDIA Rubin CPX GPU Architecture: Transforming AI Inference Infrastructure for High-Performance Computing and Generative Applications

The NVIDIA Rubin CPX GPU Architecture: Transforming AI Inference Infrastructure for High-Performance Computing and Generative Applications

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next test is buyer behavior: whether teams with large, repeatable AI workloads move more inference onto owned hardware, or continue paying for managed APIs to avoid maintenance and quality risk. Future model releases, hardware prices, and API price cuts will move the break-even line.

Local LLM Inference Optimization: A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

Local LLM Inference Optimization: A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Does free download mean free AI inference?

No. The source says the model weights may cost nothing to download, but running them requires hardware, electricity, maintenance, software plumbing, and staff time.

When can running your own model beat paying for an API?

According to the field note, self-hosting is more likely to win when usage is steady, high-volume, and predictable. APIs are more likely to win when demand is low, uneven, or dependent on the best frontier models.

Is the 80 million tokens per month break-even point fixed?

No. The source presents that figure as an illustrative scenario, not a general market price. The threshold can move based on hardware, power, staffing, model quality, and workload mix.

Why does data sovereignty matter here?

If inference runs on a company’s own machines, sensitive data does not need to be sent to an outside API provider. The source describes that as a structural privacy and control benefit, while also saying the operator takes on more technical responsibility.

Are open models now equal to closed frontier models?

The source says open models have narrowed the gap and may match closed systems on some tasks, but it still describes closed frontier models as ahead on the hardest long-horizon agentic work.

Source: Thorsten Meyer AI

You May Also Like

From Prompt to Funnel in 60 Seconds: What AI Form Builders Actually Do

Discover how AI form builders turn simple prompts into complete funnels instantly. Learn what makes them powerful and how they reshape lead generation in 2025.

Kindle loyalists scramble as Amazon turns page on old e-readers

Amazon has announced it will no longer provide software updates or support for certain older Kindle models, prompting concern among longtime users.

Shipping a laptop to a refugee camp in Uganda

A detailed report on the complex process of sending a laptop to a refugee in Uganda, highlighting logistical hurdles, legal issues, and ongoing efforts.

A Forth-inspired language for writing websites

A developer has introduced Forge, a stack-based language inspired by Forth, enabling web development through concise, programmable HTML generation.