We shipped our entire docs site with frontier models for $0.99
case study · context efficiency

Full Docusaurus site.
1/3 context. $0.99.

We built docs.l6e.ai — a complete Docusaurus documentation site — in a single agent run under l6e's MCP budget gate. This is the full account of what happened, the verified billing data, and what we think explains it.

verified billing data

datemodeltokenscost
Mar 13, 02:48 PMclaude-4.6-sonnet-medium-thinking1.8M$0.99

Source: Cursor On-Demand billing dashboard · single charge · Mar 13, 2026

the setup

Agent: claude-4.6-sonnet-medium-thinking running in Cursor, with l6e's MCP server active. Budget set before the task. The run began with a full planning phase — the agent articulated the full site structure, page hierarchy, and content outline before writing a single file. l6e's l6e_authorize_call gated each major phase transition.

what was built

A complete Docusaurus 3 documentation site: full sidebar structure, multiple doc pages, custom theme config, navbar, footer, and deployment config. The site is live at docs.l6e.ai.

the numbers

Total tokens processed1.8M
Peak context window usage~1/3
Context summarizations0
Context resets / new chats0
Total cost$0.99

the mechanism

The budget was active during the planning phase itself — not just implementation. This is the key detail. The model couldn't over-read to build its plan. It had to commit to a structure based on targeted reads, not exhaustive ones. By the time it started writing files, it had already practiced deliberateness.

Without cost visibility during planning, an agent reads speculatively — pulling in files to resolve uncertainty that hasn't formed yet, re-reading things it already has, building a mental model through volume rather than precision. With the budget gate active from the start, each read had to justify itself before it happened. That discipline carried through the entire run.

The 1/3 context figure isn't a compression trick. It's what a deliberate agent looks like when the budget constraint fires before the first token of planning, not after. The 1.8M token count is real — the model did significant work. It just didn't repeat itself.

honest caveat

Docusaurus has a natural, predictable structure. The planning phase works best when the task has a clear shape upfront. A more ambiguous build — unclear requirements, evolving architecture mid-task — would likely show different dynamics. We're being deliberate about only publishing findings we can describe with a specific context, a specific outcome, and a plausible mechanism. This is one data point, not a general claim.