Cutting Claude Code Token Costs: What caveman, codesight, and codeburn Actually Deliver

Token Cost Has a Structural Answer — Dissecting caveman, codesight, and codeburn

📅 June 2, 2026 · Developer Tools Analysis

Once Claude Code billing starts to sting, the instinct is to reach for a tool that promises to slash token usage. The structural reality, though, is that roughly 75% of session cost comes from input and cache tokens — output is only ~25%. With that in mind, the three tools generating buzz — caveman, codesight, and codeburn — each target an entirely different layer of the cost stack. This post breaks down where each one actually lands, including the gap between advertised and independently measured savings.

Why Token Cost Has Become a Real Problem

AI coding tools bill in proportion to the number of tokens that pass through the context window per inference: input tokens, cache tokens, and output tokens are each multiplied by their respective per-token rates. Per Claude Code's official documentation, average enterprise usage runs ~$13 per developer per active day, or $150–250/month. Teams running without optimization have reported seeing $2,400+ monthly invoices for a six-person group — a real forcing function.

▶ The key asymmetry to keep in mind: independent benchmarks place output tokens at roughly 25% of total session cost. The remaining 75% belongs to input and cache. This ratio is the critical lens for evaluating every cost-cutting claim below — tools that only compress output face a structural ceiling on how much they can save.

Session Cost Breakdown (independent benchmarks)

Input + Cache Tokens ~75% — the primary cost lever
Output Tokens ~25% — compression has a low ceiling here

High-Impact Cost Reduction Starts with Habits, Not Tools

Before examining any third-party tool, one point deserves emphasis: the highest-leverage reductions come from input/cache-side practices. Of Anthropic's nine officially recommended cost levers, the three with the largest reported savings are:

Reported savings by lever

Prompt Caching (cache_control) up to 90%
Semantic Prompting (structured instructions) up to 74%
Model Tiering (Opus / Sonnet / Haiku) median 68%

The full set of levers, mapped to where they cut in the cost stack:

Lever Mechanism Effect
Active Context Management Run /clear or /compact at task boundaries to compress stale context Eliminates reprocessing of accumulated tokens
Slim CLAUDE.md Keep under 200 lines; move the rest to on-demand Skills Reduces fixed per-session input overhead
Hooks Pre-processing Use PreToolUse hooks to pre-filter logs and search results before they hit the context Hundreds of thousands of tokens → hundreds
Subagent Delegation Delegate verbose tasks to subagents; return only a summary to the main context Preserves the main context window
Semantic Caching (Redis) Reuse stored responses for semantically similar queries Up to 73% for high-repetition workloads

Key takeaway: The highest-impact levers are input/cache-side habits — caching, model routing, and slimming CLAUDE.md. Output compression is lower priority. Audit these practices before reaching for any third-party tool.

Where Each Tool Targets the Cost Stack

caveman, codesight, and codeburn are not competing alternatives. They are complements that address different layers of the cost stack. Each tool is examined below by purpose, mechanism, actual benefit, and installation.

① Caveman — A "Caveman Grammar" Skill That Trims Output Tokens

Purpose & mechanism — A Claude Code skill by Julius Brussee that forces the AI to drop articles, politeness phrases, and background explanations, answering in terse "caveman grammar." The result is fewer output tokens spent on prose wrapping code. The code itself is not modified — only the natural-language text surrounding it is compressed.

Normal: "I've updated the authentication middleware to validate the JWT token before processing the request…"
Caveman: "Fix auth. JWT validate before request. Works."

Claimed vs. measured savings — the gap is the story

Claimed (GitHub repo)
65%
Independent (output only)
30–50%
Independent (full session)
4–10%

The sharper finding from independent benchmarks: adding a single "be brief" line to a prompt produced nearly equivalent token counts (401–449 vs. 419) and quality scores (0.970–0.976 vs. 0.985) to caveman. Thinking tokens are unaffected, and the skill itself loads as additional input tokens, partially offsetting the savings it generates. The practical conclusion: caveman is a secondary assist for repetitive tasks where verbose explanations are the specific bottleneck — not a tool that halves your bill.

Installation (Node.js ≥18)

# macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# activate / commands
/caveman        # default compression
/caveman ultra  # maximum compression
normal mode    # restore normal mode
/caveman-stats  # view session savings

v1.8.2 (2026-05-12), MIT license. Star counts and version numbers are from the repository's own documentation — verify directly before installing.

② Codesight — Eliminating Project Exploration Overhead with Pre-Built Context

Purpose & mechanism — The opposite end of the spectrum from caveman. Instead of trimming output, codesight targets the thousands — sometimes tens of thousands — of input tokens the AI burns at the start of every session just mapping your project structure. It scans the codebase once, builds a structured context map, and injects it at session start. The scanner includes 15 detectors and 11 AST extractors covering TypeScript, Python, Go, Rust, PHP, C#, Swift, Kotlin, and more.

Token reduction ratios on real projects (vendor benchmarks)

SaaS C
131.8×
SaaS A
83.7×
SaaS B
59.4×

Example: SaaS C reduced manual exploration from 47,450 tokens to 4,162 after Codesight. "Manual exploration" here means the AI reading files one by one to reconstruct project structure — the default behavior without a pre-built map.

The gains are largest on large, unfamiliar codebases; the benefit shrinks on small projects you already know well. Critically, this tool attacks the input token side — the 75% bucket — which puts it in a different tier from caveman on raw savings potential.

Installation (MCP integration)

npx codesight        # generate context map without installing
npx codesight --init # auto-generate CLAUDE.md and related files

# settings.json (register MCP server)
"mcpServers": { "codesight":
  { "command": "npx", "args": ["codesight", "--mcp"] } }

After registration, 13 tools become callable directly: codesight_scan, codesight_get_routes, and others. v1.9.3, zero dependencies, MIT license.

③ Codeburn — Visibility Into Where Your Tokens Are Actually Going

🧠 Purpose & mechanism — This is a measurement tool, not a reduction tool. The premise: you can't cut what you can't see. Codeburn reads Claude Code, Codex, Cursor, and OpenCode session transcripts locally — no server, no proxy, no API key — and renders an interactive TUI dashboard. It classifies spend across 13 task types and lets you track cost by model, tool, project, and date.

No direct token reduction. Instead, it surfaces where cost is concentrating: which task types account for the bulk of spend, what your cache hit rate looks like, how often one-shot completions succeed. codeburn optimize scans for 11 waste patterns and pairs each finding with a fix suggestion and estimated savings. In other words, codeburn tells you which of the nine levers described above to pull first — making it the natural starting point before any other investment.

Installation

npm install -g codeburn  # or: brew install codeburn
codeburn            # last 7 days
codeburn today     # today only
codeburn optimize  # scan waste patterns + recommendations

TUI shortcuts: arrow keys (period navigation), c (model comparison), o (optimization view), q (quit). v0.9.11 (2026-05-27), MIT license.

Side-by-Side: Where Each Tool Fits

Dimension Caveman Codesight Codeburn
Token Target Output (~25%) Input exploration (large share) None (diagnostic)
Approach Compresses response prose Pre-injects structured context map Surfaces cost visibility
Measured Reduction 4–10% per session 83–131× on first exploration Indirect (diagnostic)
Best Fit Repetitive coding where explanations are unnecessary Large, unfamiliar codebases Diagnosing where to start optimizing

codesight hits the largest cost bucket (input/cache), codeburn tells you where to aim, and caveman trims the smallest bucket (output) as a secondary assist.

The Right Order to Adopt These Tools

You don't need all three. The rational sequence follows the cost structure: diagnose first → address the largest bucket → layer in secondary assists.


flowchart TD
  A([Start Reducing Token Cost]) --> B[codeburn
Diagnose current spend] B --> C{Large or
unfamiliar codebase?} C -->|YES| D[codesight
Eliminate input overhead] C -->|NO| E[caveman
Optional output trim] D --> E style A fill:#3498db,stroke:#2980b9,color:#ffffff style B fill:#f4ecf7,stroke:#8e44ad style C fill:#fef9e7,stroke:#f39c12 style D fill:#eafaf1,stroke:#27ae60,color:#1e8449 style E fill:#fef9e7,stroke:#e67e22,color:#c0392b

🔁 Reading the diagram: Start with codeburn to diagnose where tokens are going. If you're working on a large, unfamiliar codebase, codesight gives you the biggest return on the input side. Only then add caveman for repetitive tasks where verbose output explanations are the actual bottleneck.

💡 Honest caveat: numbers worth verifying — Most star counts, version numbers, and benchmark multipliers in this post come from each tool's own repository or from a single source. Caveman's savings rate is an explicit conflict between sources. Codesight's 83–131× figure is vendor-provided benchmark data.

Before making adoption decisions, run a before/after measurement with codeburn on your own real workload. Vendor benchmarks on unfamiliar codebases may not reflect your actual usage profile.

Structural Thinking Beats Any Single Tool

① The largest savings come from habits, not tools. Prompt caching (up to 90%), model tiering (median 68%), slimming CLAUDE.md, and using /clear are where the real leverage lives — and they cost nothing to set up.

② Calibrate expectations for caveman. The repository claims 65%, but independent benchmarks put full-session savings at 4–10%. A single "be brief" instruction produced nearly equivalent results in at least one controlled test. Try the one-liner before installing the skill.

③ Measure first → input/cache first → output tools as secondary assists. This sequence is the structural answer to token cost. It follows where the money actually is.

The real answer to "token costs are getting expensive" isn't which tool to install. It's measuring where your session tokens are leaking first. Without that baseline, you'll end up tuning the 25% output bucket while the 75% input/cache cost goes untouched.

References

This post synthesizes publicly available repository documentation, official sources, and independent benchmarks. Star counts, version numbers, and reported savings multipliers reflect source information at time of writing and may change. Measure against your own workload before making adoption decisions.

S
SW Develope
Software Development Notes

Developer-curated content — collected, organized, and reviewed before publishing.

Written based on publicly available data and sources. Last updated: June 8, 2026

댓글

이 블로그의 인기 게시물

Cutting Claude Code Token Usage by 75%: What the Caveman Technique Actually Delivers

Claude Code ultracode — What It Is, How to Enable It, and Who Can Use It

Does Open-Source Headroom Cut LLM Costs by 90%? A Fact Check