Open a fresh Claude Code session. Type /context. Stare at the number.
That’s how many tokens Claude burned before you typed a single character — reading your CLAUDE.md, loading every enabled skill, ingesting MCP server descriptions, and parsing whatever was left in the context from the last conversation. Now run git status and watch it tick up another 2,000.
You don’t need to spend more to fix this. You need to send less noise.
This guide covers 10 tools that address the problem at different layers — terminal output, LLM responses, codebase navigation, documentation structure, and MCP overhead. Each section shows you exactly how to install and use it.
Quick Reference
| Tool | What it targets | Claimed reduction | Install complexity |
|---|---|---|---|
| Caveman | Output verbosity | 65–75% | Instant (skill) |
| RTK | Terminal output | 60–90% | Low (binary) |
| Code Review Graph | Codebase reads | 6–49× | Medium (pip) |
| Context Mode | MCP & logs | 98% | Low (plugin) |
| Claude Token Optimizer | Docs structure | 90% | Low (curl) |
| Token Optimizer | Ghost tokens | varies | Low (plugin) |
| Token Optimizer MCP | MCP calls | 95%+ | Medium (npm) |
| Claude Context | Codebase search | 40% | Medium (MCP) |
| Claude Token Efficient | Claude responses | ~63% | Instant (file drop) |
| Token Savior | Code navigation | 97% | Medium (pip/uvx) |
1. Caveman
Repo: JuliusBrussee/caveman — 39.7k stars
The simplest tool on this list. Caveman is a Claude Code skill that makes Claude drop articles, filler words, and pleasantries while keeping all the technical substance. Instead of a 69-token explanation, you get a 19-token one that says exactly the same thing.
Normal Claude:
“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using
useMemoto memoize the object.”
Caveman Claude:
“New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo.”
Same answer. 19 tokens instead of 69.
Install
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Or use the standalone hook installer:
bash <(curl -s https://raw.githubusercontent.com/JuliusBrussee/caveman/main/hooks/install.sh)
Usage
Activate it in any session:
/caveman
Or just say: "talk like caveman", "caveman mode", or "less tokens please".
Disable with: "stop caveman" or "normal mode".
Intensity levels
| Mode | Description |
|---|---|
lite | Drop filler, keep grammar |
full | Default — fragments, no articles |
ultra | Maximum compression, telegraphic |
文言文 | Classical Chinese compression |
There’s also a caveman-compress companion skill that reduces input tokens in your CLAUDE.md files by ~46%.
💡 Pro tip: Pair caveman mode with long debugging sessions where you just need the fix, not the explanation. Switch back to normal mode when you want reasoning documented in commit messages or PR descriptions.
2. RTK (Rust Token Killer)
Repo: rtk-ai/rtk
RTK is a Rust binary that sits between your shell and Claude Code. When Claude runs git status, it doesn’t get the raw output — it gets a compressed summary. Same with cargo test, npm test, docker ps, and 30+ other commands. A 30-minute session that would consume ~118,000 tokens drops to ~24,000.
⚠️ Warning: There’s an unrelated package also called “rtk” on crates.io. Always use the
--gitflag when installing via Cargo, or use Homebrew.
This blog already has a deep-dive on RTK: RTK 101: Cut Your Claude Token Usage by 80%. Here’s the quick version:
Install
# Homebrew (recommended)
brew install rtk
# Linux / macOS without Homebrew
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
# Cargo (from source)
cargo install --git https://github.com/rtk-ai/rtk
Wire it to Claude Code
rtk init --global
Restart Claude Code. Done. Every Bash command Claude runs now passes through RTK automatically.
Check your savings
rtk gain # Summary by command
rtk gain --graph # ASCII graph (last 30 days)
rtk discover # Commands still passing through uncompressed
3. Code Review Graph
Repo: tirth8205/code-review-graph
On a large monorepo, Claude reads every relevant file to understand a change. Code Review Graph builds a persistent Tree-sitter AST of your codebase stored in SQLite. When something changes, it computes the blast radius — which functions, classes, and files are actually affected — and hands Claude only those. You go from reading the whole codebase to reading a surgical slice.
- Code reviews: 6.8× fewer tokens
- Daily coding tasks: up to 49× fewer tokens
- Initial indexing: ~10 seconds for a 500-file project
- Incremental updates: <2 seconds
Supports 23 languages plus Jupyter notebooks.
Install
pip install code-review-graph
code-review-graph install # auto-detects Claude Code, Cursor, Windsurf, etc.
code-review-graph build # initial parse of your codebase
Key commands
code-review-graph update # incremental re-parse of changed files
code-review-graph detect-changes # risk-scored impact analysis
code-review-graph visualize # interactive HTML dependency graph
code-review-graph watch # continuous auto-updates
Once installed, you can ask Claude: “Build the code review graph for this project” — it’ll use the index automatically during reviews.
💡 Pro tip: Run
code-review-graph watchin a background terminal during active development. The index stays current and Claude gets the minimal context for every question.
4. Context Mode
Repo: mksglu/context-mode
Context Mode attacks a different problem: raw tool output getting dumped into the context window. When Claude fetches a GitHub issue, runs a long log command, or scrapes a webpage, that data lands in your context and stays there. Context Mode sandboxes that output into SQLite instead.
- 315 KB of tool output becomes 5.4 KB in context
- Session continuity across compaction events (tasks, files, and decisions persist)
- Local-only, no telemetry
Install for Claude Code
/plugin marketplace add mksglu/context-mode
Key commands
ctx stats # Show context savings and call counts
ctx doctor # Diagnose runtimes and FTS5 compatibility
ctx upgrade # Update and reconfigure
ctx insight # Analytics dashboard (local web UI)
ctx purge # Delete indexed content
Core MCP tools available in your session
| Tool | What it does |
|---|---|
ctx_execute | Run code in 11 languages, return stdout only |
ctx_batch_execute | Multiple commands in one call |
ctx_fetch_and_index | Fetch URLs, cache 24 hours |
ctx_index | Chunk markdown into FTS5 with BM25 ranking |
ctx_search | Query indexed content |
Example: Analyzing 500 commits becomes one tool call returning 5.6 KB instead of 315 KB of raw git log.
Research GitHub repo commits, extract top contributors and frequency
# → ctx_execute handles it, 1 call, 5.6 KB context
5. Claude Token Optimizer
Repo: nadimtuhin/claude-token-optimizer
Most projects load all their documentation at session start. This tool restructures that. It separates “always-load” documentation (startup, ~800 tokens) from “load-on-demand” documentation (the rest, 0 tokens until referenced).
Before: 8,000 tokens at startup, 11,000 total. After: 800 tokens at startup, 1,300 total.
Install
Run this in your project root:
curl -fsSL https://raw.githubusercontent.com/nadimtuhin/claude-token-optimizer/main/init.sh | bash
The script asks about your project type (Express, Next.js, Vue, Django, Rails, etc.) and takes ~2 minutes to scaffold the structure.
What it creates
CLAUDE.md # Primary entry point (~50 tokens)
COMMON_MISTAKES.md # Top 5 critical bugs (~350 tokens)
QUICK_START.md # Frequent commands (~100 tokens)
ARCHITECTURE_MAP.md # Code organization (~150 tokens)
.claude/ # Extended docs — loaded only when referenced
docs/ # Deep dives — loaded only when referenced
Maintenance habit
Add any bug that takes you more than an hour to track down to COMMON_MISTAKES.md. Next time, Claude reads it before guessing.
⚠️ Warning: The savings depend entirely on your current documentation being verbose. If you already have a lean
CLAUDE.md, this tool has less impact.
6. Token Optimizer
Repo: alexgreensh/token-optimizer
This one goes hunting for what the project calls “ghost tokens” — token waste that’s invisible in normal usage: bloated CLAUDE.md files with stale content, unused skills still registered and loading, duplicate system prompts, MCP server descriptions for tools you’ve removed, and MEMORY.md entries beyond line 200 that Claude can’t actually access.
It provides a real-time quality score and automated compression recommendations.
Install
Claude Code plugin:
/plugin marketplace add alexgreensh/token-optimizer
/plugin install token-optimizer@alexgreensh-token-optimizer
Manual:
git clone https://github.com/alexgreensh/token-optimizer.git ~/.claude/token-optimizer
bash ~/.claude/token-optimizer/install.sh
Key commands
python3 measure.py quick # 10-second health check
python3 measure.py quality # 7-signal degradation tracking
python3 measure.py doctor # Installation health check (0–10 score)
python3 measure.py memory-review # Audit MEMORY.md for orphans
python3 measure.py attention-score # CLAUDE.md attention-curve alignment
python3 measure.py drift # Config growth vs. baseline
python3 measure.py savings # Dollar savings report
python3 measure.py dashboard --serve # Local analytics dashboard
💡 Pro tip: Run
python3 measure.py doctorafter any significant change to your settings or CLAUDE.md — it catches invisible waste before it compounds across sessions.
7. Token Optimizer MCP
Repo: ooples/token-optimizer-mcp
Token Optimizer MCP applies caching and compression directly at the MCP layer. When Claude calls an MCP tool, the server intercepts the response and strips redundant content before it enters the context. The project claims 95%+ reduction on MCP-heavy workflows.
⚠️ Note: The README is sparse on implementation details. Verify the tool actively before committing it to your stack — the core concept is sound but maturity is unclear.
Install
git clone https://github.com/ooples/token-optimizer-mcp.git
cd token-optimizer-mcp
npm install
npm run build
Configure in your claude.json or .mcp.json following the server config in the repository’s server.json.
8. Claude Context
Repo: zilliztech/claude-context
From Zilliz (the Milvus vector database company). Claude Context adds an MCP server that indexes your codebase using hybrid search — BM25 keyword matching combined with dense vector embeddings. Instead of reading files to answer a question about your code, Claude queries the index with natural language and gets back only the relevant chunks.
Claims ~40% token reduction vs. traditional full-file reads.
Prerequisites
- Zilliz Cloud account (free tier available) — for the vector database
- OpenAI API key — for generating embeddings
Install for Claude Code
claude mcp add claude-context \
-e OPENAI_API_KEY=sk-your-key \
-e MILVUS_TOKEN=your-zilliz-token \
-- npx @zilliz/claude-context-mcp@latest
Usage
- Open Claude Code in your project:
cd your-project && claude - Index your codebase: “Index this codebase”
- Check status: “Check the indexing status”
- Search: “Find functions that handle user authentication”
Available tools
| Tool | What it does |
|---|---|
index_codebase | Index a directory for hybrid search |
search_code | Natural language query over indexed code |
get_indexing_status | Monitor indexing progress |
clear_index | Remove a codebase index |
⚠️ Warning: This tool requires two external API keys (Zilliz + OpenAI). There’s a cost dependency beyond just Claude — factor that in before adopting it.
9. Claude Token Efficient
Repo: drona23/claude-token-efficient
The simplest install on this list: drop a CLAUDE.md into your repo. The file tells Claude to skip filler phrases, avoid re-reading unchanged files, prefer targeted edits over full rewrites, and omit preamble and closing pleasantries. The project measures ~63% output token reduction in test cases.
The file addresses eight default Claude behaviors that waste tokens without adding value: verbose explanations, unnecessary file rewrites, sycophantic chatter, over-engineered solutions, and more.
Install
Option 1 — direct download:
curl -o CLAUDE.md https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md
Option 2 — pick a profile:
git clone https://github.com/drona23/claude-token-efficient
cp claude-token-efficient/profiles/CLAUDE.coding.md your-project/CLAUDE.md
Option 3 — paste into chat: Copy the contents and paste them directly into any Claude session for one-off use.
The file works on an override principle — user instructions always win. Ask for a detailed explanation and you’ll get one.
Bonus: Matt Pocock’s one-liner
If you want an extra low-effort win, copy this simple rule from Matt Pocock:
“In all interactions and commit messages, be extremely concise and sacrifice grammar for the sake of concision.”
It’s not sophisticated, but it works surprisingly well to reduce verbosity and keep responses tighter.
💡 Pro tip: Merge this file with your existing
CLAUDE.mdrather than replacing it. Pick the rules that match your actual pain points and leave the rest.
10. Token Savior
Repo: Mibayy/token-savior
Token Savior is an MCP server with two capabilities: symbol-based code navigation and persistent memory across sessions.
Symbol navigation: Instead of reading entire files to find a function, Token Savior indexes your codebase by symbol. Finding a symbol goes from injecting 41 million characters to injecting 67 characters — a 99.9% reduction. Getting a function’s source is a direct 4.5K-char lookup.
Persistent memory: A SQLite-backed engine with FTS5 full-text search stores decisions, bugfixes, and conventions across sessions. Three-layer retrieval keeps even memory lookups lean: index first (~15 tokens), search second (~60 tokens), full fetch only when needed (~200 tokens).
Benchmark results across 170+ real sessions: 118/120 (98%) vs. plain Claude Code’s 67/120 (56%).
| Metric | Plain Claude | Token Savior | Delta |
|---|---|---|---|
| Active tokens | 1.02M | 614K | −40% |
| Wall time | 51 min | 28 min | −46% |
| Benchmark score | 67/120 | 118/120 | +42 pts |
Install
Quickest (no venv needed):
uvx token-savior-recall
Via pip:
pip install "token-savior-recall[mcp]"
# With vector search support:
pip install "token-savior-recall[mcp,memory-vector]"
Register with Claude Code:
claude mcp add token-savior -- /path/to/venv/bin/token-savior
Or manually in claude.json:
{
"mcpServers": {
"token-savior-recall": {
"command": "/path/to/venv/bin/token-savior",
"env": {
"WORKSPACE_ROOTS": "/path/to/project1,/path/to/project2",
"TOKEN_SAVIOR_CLIENT": "claude-code"
}
}
}
}
Memory retrieval pattern
Always start lean:
Layer 1: memory_index → ~15 tokens/result (always start here)
Layer 2: memory_search → ~60 tokens/result (only if L1 matched)
Layer 3: memory_get → ~200 tokens/result (final confirmation)
The God-Tier Stack
No single tool solves everything. Pick 2–3 based on where you’re actually bleeding.
If your problem is terminal output noise
RTK is the answer. One binary, one hook install, zero behavior change required. Every git, test, and build command Claude runs gets compressed automatically. Start here — it’s the easiest ROI on this list.
If your problem is large codebases
Code Review Graph + Token Savior together. Code Review Graph handles code reviews by computing blast radius. Token Savior handles everything else by navigating by symbol instead of by file. Combined, you’re not reading files anymore — you’re querying indices.
If your problem is MCP tool dumps
Context Mode is the right layer. It sandboxes raw tool output into SQLite before it touches your context window. A GitHub issue fetch goes from polluting your context with 300 KB to handing you a 5 KB summary.
If you want a zero-cost immediate win
Caveman + Claude Token Efficient require no infrastructure, no API keys, no accounts. Drop a CLAUDE.md, install a skill, and your next session already spends ~60% fewer tokens on output. Takes under two minutes.
The full stack (if you’re serious about it)
Caveman ← Output verbosity
RTK ← Terminal noise
Context Mode ← MCP dumps
Code Review Graph ← Code review reads
Token Savior ← Code navigation + memory
Claude Token Efficient ← CLAUDE.md baseline
Run /context in a fresh session before and after. The difference is real.