How to Save Tokens in Claude Code: Reduce Token Usage and Cut Costs for Every Session
It's Sunday night. You're 40 minutes into a refactor session, things are finally clicking — and Claude Code hits a wall. Context limit. Session over.
Or maybe you didn't notice until you opened your API dashboard on Monday morning and found a bill that made you do a double-take.
Either way, you've felt the sting of runaway token usage. And the frustrating part? A lot of it was preventable.
This post is your fix. By the end, you'll have a clear, prioritised action list for slashing token consumption in Claude Code — whether you're on a Pro subscription or paying per API call. No vague advice, no made-up percentages, just practical strategies ranked by actual impact.
Who this is for: developers, indie hackers, and teams who use Claude Code every day and want to get more done without watching their token budget evaporate.
What Are Tokens and Why Do They Matter in Claude Code?
Before we talk about saving them, let's make sure we're on the same page about what tokens actually are.
Tokens Are Not Words — Here's What They Actually Are
A token is a small chunk of text. Not a word, not a character — somewhere in between. For English prose, roughly 3–4 characters make up one token. For code, it's a bit different.
Code tokenizes less efficiently than plain English because of all the symbols, indentation, punctuation, and repetition involved. That node_modules folder full of boilerplate? A token monster. A 200-line Python file sits somewhere around 500–800 tokens depending on complexity — and that's before Claude starts responding.
Quick mental model: think of a token like a Lego brick. Your prompt and files are the pile of bricks you hand Claude. Claude reads every brick, then builds something back. The more bricks you hand over, the longer everything takes — and the more it costs.
How Claude Code Consumes Tokens Differently From the Chat Interface
Claude Code isn't a chatbot. It's an agent. And that changes everything about how tokens get spent.
Every time Claude Code calls a tool (reads a file, runs a command, checks output), it re-sends the full conversation history as context. Again. And again. And again. This is called an agentic loop, and it's the main reason your token count in Claude Code climbs so much faster than in a regular chat.
Here's the key distinction:
- Input tokens — everything Claude reads: your prompt, all the files it accesses, and the entire conversation so far
- Output tokens — what Claude writes back
In a long Claude Code session, input tokens dominate the bill. Your prompts might feel short, but they're a drop in the bucket compared to the file reads and history being re-sent on every turn.
The other thing worth knowing: token count doesn't grow linearly. It compounds. A 30-turn session doesn't cost 30 × the average turn — it costs significantly more, because each new turn carries all the previous turns with it.
Agentic Loop Token Compounding (Approximate Context Size)
Turn 1 | █ 800 tokens (Prompt + initial files)
Turn 5 | ████ 3,800 tokens (Accumulated history)
Turn 10 | █████████ 8,500 tokens
Turn 15 | ████████████████ 14,000 tokens
Turn 20 | ████████████████████████ 22,000 tokens
Turn 30 | ███████████████████████████████████████ 38,000 tokens
*Notice the growth curve: Every new tool call re-sends the entire preceding history.*
What Drives Token Waste in a Typical Claude Code Session
Most token waste comes from a handful of habits:
- Feeding Claude your whole codebase when it only needs two files
- Piping noisy CLI output directly into context without filtering
- Letting sessions run on too long and dragging irrelevant history into new tasks
- Using a powerful (expensive) model for tasks any model could handle
- Wordy, over-polite prompts that re-explain background information Claude doesn't need
Fix these, and you'll feel the difference immediately.
The Priority Stack: Where to Start
Before we go deep, here's the ranked overview. Return to this table whenever you want a quick reminder.
| Tier | Category | Impact |
|---|---|---|
| 1 | Context management — what you feed Claude | Highest |
| 2 | Model selection — which Claude you use | High |
| 3 | Prompt discipline — how you write requests | Medium |
| 4 | Session hygiene — /compact, /clear, fresh starts |
Medium |
| 5 | Tooling and automation — CLI tools, MCP servers | Compounding |
💡 Good news: Even applying just Tier 1 and Tier 2 strategies makes a meaningful dent in usage for most sessions.
Work through the tiers in order. Don't skip to Tier 5 thinking clever automation will fix a context management problem.
Tier 1 — Control What Claude Reads (Context Management)
This is the single biggest lever you have. More important than your model choice. More important than how you phrase prompts. What you feed Claude determines the token floor for your entire session.
Stop Sending Your Whole Codebase
It's tempting. You want Claude to "just understand the project," so you point it at everything. This backfires almost every time.
Dumping a whole codebase into context means Claude is reading node_modules, generated files, lock files, old migrations, and whatever else is in there — none of which it needs for the task at hand. You're paying for all of it.
The fix is simple in principle: only give Claude the files relevant to the current task. State it explicitly. "Look at src/auth/login.ts and src/middleware/session.ts only." Claude doesn't need to discover the relevant files if you can tell it where they are.
Unmanaged Context Cost
$...Optimized (Scoped) Cost
$...Use a Structured Context Generator
If you want to be systematic about this, context generator tools do the hard work of building a lean, token-efficient project snapshot for you. Instead of a raw file dump, you hand Claude a curated summary of your project.
Three solid options:
repomix
Packs your repo into a single AI-friendly file. It automatically strips lockfiles, build output, and generated code, and it shows you a token count before you send anything. Great starting point for most projects.
npx repomix
code2prompt
A Rust CLI with prompt templating and source-tree output. Fast, flexible, good for teams who want more control over what gets included.
codesight
Universal context generator with a built-in MCP server, so it works across Claude Code, Cursor, Copilot, and Codex. Worth considering if you're switching between tools.
How to choose:
| Tool | Best for |
|---|---|
| repomix | Quick setup, most projects |
| code2prompt | Performance-focused teams, Rust ecosystem |
| codesight | Multi-tool workflows, MCP integration needed |
The workflow is the same for all three: run the tool once at the start of your session, hand Claude the output instead of pointing it at raw directories. That's it.
Filter and Compress CLI Output Before It Reaches Claude
Running grep -r "someFunction" . across a medium-sized codebase can return hundreds of matches with surrounding context lines duplicated over and over. If that output goes straight into your Claude Code session, you've just added a lot of tokens for very little value.
What to filter before sending CLI output to Claude:
- Duplicate file paths
- Repeated surrounding context lines (grep's
-Band-Acontext flags are useful but noisy) - Long stack traces where only the first few lines matter
- Verbose build logs — pass in the error, not the whole log
Some teams use tools like RTK to deduplicate shell output before it enters the context window. Even manual trimming makes a difference.
This matters most for: git diff on a large branch, grep across many files, long test suite output, and build logs.
Limit File Context to the Task at Hand
Two habits that compound well:
1. Explicit scope declarations in your first prompt. Start every task with something like: "For this task, only look at src/payments/. Don't read anything outside that directory." Claude respects scope instructions.
2. Use .claudeignore. Similar to .gitignore, this tells Claude Code which paths to leave alone. Add it to your project root and list anything you never want Claude to read by default.
Things that typically belong in .claudeignore:
node_modules/
dist/
build/
.git/
*.lock
*.min.js
coverage/
__pycache__/
.env
Set this up once and forget about it. It pays dividends on every future session.
Tier 2 — Pick the Right Model for Every Task
Model selection is the second biggest lever, and it's one most people get wrong in the same direction: they default to Opus for everything.
Understanding the Claude Model Tiers
| Model | Strengths | Best for |
|---|---|---|
| Opus | Deep reasoning, complex analysis, understanding unfamiliar systems | Architecture decisions, complex debugging, large refactors |
| Sonnet | Fast, capable, great at code | Editing, implementation, agentic loops, most daily tasks |
| Haiku | Cheap, quick, solid for simple tasks | Renaming, formatting, quick lookups, repetitive sub-tasks |
The cost difference between these tiers is significant. Using Opus for a task Sonnet handles just as well isn't just wasteful — it's usually slower too.
The Architect/Contractor Mental Model
Here's a way to think about it that makes the decision obvious:
- Opus = the architect. You hire an architect for blueprints, structural decisions, and solving hard problems you can't figure out yourself. You don't hire them to paint the walls.
- Sonnet = the contractor. Reliable, fast, cost-effective. Executes a clear plan well. Your everyday workhorse.
- Haiku = the apprentice. Great at simple, well-defined tasks. Give them a clear brief and they'll get it done.
The most common expensive mistake: using Opus as the default for everything, including the painting.
Opus Plan Mode — Plan With Opus, Execute With Sonnet
Claude Code has a built-in way to get the best of both worlds: Opus Plan Mode.
/model opus-plan
What this does: routes your planning step to Opus, then hands execution off to Sonnet within the same session. You get Opus-quality thinking for the architecture decisions, and Sonnet efficiency for the actual implementation.
How to use it well:
- Activate
/model opus-plan - Describe your task fully — include the goal, constraints, and any relevant files
- Read the plan before saying "go ahead" — this is important. Reviewing the plan before execution catches misunderstandings early, before they compound into wasted turns
- Approve or adjust, then let Sonnet run
When Opus Plan Mode shines:
- Large refactors
- Building a new feature from scratch
- Debugging a complex issue across multiple files
- Working in an unfamiliar codebase
When it adds less value:
- Quick one-off tasks where the plan is obvious
- Highly exploratory, iterative work
- Tasks that are clearly Sonnet-level from the start
The Anthropic Advisor Strategy
A more flexible version of the same idea: use Opus to reason about a problem and produce guidance, then hand that guidance to Sonnet to act on it.
This isn't limited to a single Claude Code session. It's a pattern you can apply across multi-model pipelines — Opus as the strategic brain, Sonnet as the hands. Anthropic and teams using this approach have reported meaningful cost reductions while maintaining or improving output quality. The exact savings depend heavily on your workflow, but the principle holds: if Sonnet can execute a plan it didn't make, there's no reason to pay Opus rates for execution.
Use this pattern when Opus Plan Mode feels too structured or you're orchestrating work across multiple steps or sessions.
Practical Model-Selection Decision Tree
Not sure which model to pick? Work through this:
Is the task primarily reasoning, architecture, or analysis?
→ Yes: Opus (or Opus Plan Mode)
→ No: continue
Is the task primarily implementation, editing, or following a clear plan?
→ Yes: Sonnet
→ No: continue
Is the task trivial, repetitive, or clearly defined?
→ Yes: Haiku
→ No: Default to Sonnet; escalate to Opus only if quality falls short
When in doubt, start with Sonnet. It handles more than people expect.
Tier 3 — Write Leaner Prompts
Prompt length isn't the biggest cost driver in a Claude Code session, but it's not nothing either — especially when you account for the fact that every prompt gets re-read on every subsequent turn.
Why Prompt Length Costs More Than You Think
The maths are simple: a wordy prompt style × 100 prompts per day adds up. More importantly, those words aren't just sent once. They're re-read on every turn for the rest of the session.
Your prompts are probably a smaller share of total input tokens than you think — file reads and conversation history usually dominate. But it's still a lever, and it costs nothing to pull.
The Dense-Prompt Technique
Strip the pleasantries and the backstory. Lead with the verb.
Verbose:
"Hey, I was hoping you could take a look at the authentication module I've been working on and maybe help me understand why the session tokens aren't refreshing correctly when the user comes back after a while? I think it might be something in the middleware but I'm not totally sure."
Dense:
"Debug: session tokens not refreshing on return visit. Check
src/middleware/session.tsfirst."
Same request. Fraction of the tokens.
When dense prompts work well:
- Quick lookups and checks
- Code reviews with a clear focus
- Simple, unambiguous tasks
When to write full sentences:
- Production-touching changes where misinterpretation is costly
- Complex instructions with multiple constraints
- Anything where you need Claude to reason carefully about edge cases
Front-Load Constraints and Scope
Structure your prompts like this:
[Task] + [Scope] + [Constraints] + [Expected output format]
Example:
"Add input validation to
src/api/createUser.ts. Only modify that file. Don't change the function signature. Return the modified function only, no explanation."
Stating what Claude shouldn't do is just as important as stating what it should. Open-ended prompts invite Claude to read more than necessary.
Batch Related Tasks Into One Planning Prompt
Instead of one request at a time — which generates a back-and-forth chain — describe a related set of changes in a single prompt.
Opus (or Sonnet) plans them as a coherent set. Execution happens in sequence. This keeps context growth linear rather than exponential, because you're not building a long question-and-answer history around tasks that should have been one conversation.
Tier 4 — Session Hygiene (Context Compaction and Fresh Starts)
Long sessions are a silent token drain. Here's how to keep them under control.
Why Long Sessions Are the Silent Token Killer
Every turn in a session re-reads the full conversation history. A 30-turn session doesn't cost 30 × the average turn — it costs substantially more, because each new turn carries the weight of everything before it.
There's also the context carry-over trap: you finish debugging a gnarly issue, and then instead of starting fresh, you immediately ask Claude to help with something completely unrelated. Now it's reading 40 turns of debugging context that has zero relevance to the new task. Wasted tokens, and often degraded output quality too.
How and When to Use /compact
/compact
/compact tells Claude to summarise the conversation so far, then continue with that summary in place of the full raw history. You lose some granular detail, but you keep the distilled decisions and context.
Best timing:
- Right after an exploration or research phase, before starting implementation
- After a debugging session where the cause is understood and you're moving to the fix
- Whenever you notice the session has gotten long and you have a clear phase boundary ahead
What you keep: key decisions, current understanding, relevant context
What you lose: the exact wording of earlier turns, granular details that probably don't matter anymore
Make /compact a habit at every major phase boundary. It's one command and it pays for itself quickly.
When to Use /clear
/clear
/clear is the nuclear option — it removes all context. Use it when you're switching to a completely unrelated task and there's nothing in the current session worth keeping.
A good pairing: run your context generator (repomix, codesight) after /clear to give Claude a fresh, accurate project snapshot. You get a clean slate with just enough project context to get started.
When Starting a New Session Is the Right Move
Sometimes context isn't an asset — it's a liability. A long session full of decisions and explorations for Task A becomes noise when you move to Task Z.
Heuristic: if your next task shares fewer than roughly 30% of the relevant files with your current task, start fresh.
The obvious concern is losing context between sessions. Claude has no memory by default, so solve this with a session handoff note — a 3–5 line note in your project file that captures:
- What was decided
- What changed
- What's next
Keep it updated as you go. It takes 30 seconds to write and saves you from re-explaining your whole project at the start of every new session.
Checking Token Usage Mid-Session
Claude Code shows you session token usage. Actually use it — especially before starting a large task.
If you're approaching the limit:
- Decide proactively whether to
/compactand continue, or start fresh - Don't wait until you hit the wall mid-implementation
Getting cut off halfway through a complex change is the worst outcome. A quick check before you kick off a large task costs nothing.
Tier 5 — Tools, Automation, and Advanced Patterns
Once you have the fundamentals solid, these patterns help you scale your efficiency further.
MCP Servers for Context-Efficient Tooling
MCP (Model Context Protocol) servers let Claude access external data and tools in a targeted, token-efficient way. Instead of pasting a whole Jira board into context, an MCP tool can fetch the one ticket Claude needs. Instead of sharing an entire database schema, an MCP query returns just the relevant tables.
The difference in practice:
| Naive approach | MCP approach |
|---|---|
| Paste entire schema into context | Query specific tables on demand |
| Share full Jira board | Fetch single ticket by ID |
| Dump all docs into prompt | Retrieve specific doc section |
If you're already using codesight, its built-in MCP server handles the context generation side of this. For external services, check what MCP servers are available for the tools you already use.
Building a Token-Efficient Project Workflow
Here's a session structure that keeps things lean from start to finish.
Session start:
- Run your context generator (
npx repomixor equivalent) to build a clean project snapshot - Filter any CLI output you plan to share before the session begins
- Set your model — Sonnet for most work,
/model opus-planfor architectural sessions - State your task scope and exclusions in the very first prompt
Mid-session habits:
/compactafter each major phase (research → plan → build → review)- Check token usage before starting a large task
- Start a new session when switching to unrelated work
End of session:
Write a 3–5 line handoff note. What was decided, what changed, what's next. Your future self will be grateful.
.claudeignore and Exclusion Patterns
Here's a starting template you can drop into any project. Adjust as needed.
Node/JavaScript projects:
node_modules/
dist/
build/
.next/
.nuxt/
coverage/
*.lock
*.min.js
*.min.css
Python projects:
__pycache__/
*.pyc
.venv/
venv/
dist/
build/
*.egg-info/
.pytest_cache/
Universal additions:
.git/
.env
.env.*
*.log
*.map
Set this once per project. It quietly saves tokens on every single session going forward.
What Not to Bother With
Some "optimisations" that sound appealing but aren't worth your time:
- Custom tokenizer hacks — fragile, save pennies, break frequently with model updates
- Prompt-shortening agents — add latency and often degrade output quality more than they save tokens
- Switching providers mid-task to save cost — the context loss usually costs more than the token savings
- Micro-optimising individual prompt length — useful up to a point, then diminishing returns fast
The 80/20 rule applies hard here. Tiers 1 and 2 give you most of the gains. Everything else is refinement.
Putting It All Together — A Day in the Life
Here's what an optimised Claude Code session looks like in practice, from first keystroke to shutdown.
Example: Starting a Feature Development Session
You're building a new user permissions system.
Opening moves:
- Run
npx repomix --include="src/auth/,src/users/"to build a lean context snapshot - Switch to
/model opus-plan - First prompt: "Plan the implementation of a role-based permissions system. Scope:
src/auth/andsrc/users/only. No UI changes. Goal: three permission levels — read, write, admin."
After the plan:
- Review it before saying go. Does it match your intent? Any assumptions you want to correct? Fix them now, not after five turns of implementation.
- Approve the plan. Sonnet takes over for execution.
Mid-session:
- After the plan is approved and execution starts,
/compactto drop the planning conversation - Sonnet makes targeted file reads only — you told it the scope, so it stays in
src/auth/andsrc/users/
Example: Debugging a Complex Issue Across an Unfamiliar Codebase
You've inherited a service that's behaving strangely and you don't know where to look.
When to reach for Opus: complex hypothesis generation, tracing an issue across many files, reasoning about unexpected interactions. This is exactly what Opus is for.
Handing off to Sonnet: once you understand the cause — "okay, the session middleware isn't clearing the cache on logout because of this race condition" — the fix is Sonnet territory. Clear bug, known location, known change. No need for Opus anymore.
After debugging: /compact if you're continuing to work, or start fresh if the next task is unrelated. Don't drag debugging context into a new feature build.
Example: Routine Maintenance and Small Fixes
Renaming a prop across a component library. Adding JSDoc comments. Fixing a typo in config values.
Default to Sonnet or Haiku. These tasks are well-defined, low-complexity, and don't require deep reasoning. Dense prompts, tight scope, no plan-mode overhead needed. Short sessions, low carry-over, done.
Frequently Asked Questions
npx repomix in your project right now and see what you get./compact— summarises the conversation and continues with the distilled version in place of raw history. Use when you want to keep going on the same task with a lighter load./clear— wipes everything. Use when switching to a completely unrelated task.
/compact as decluttering your desk. /clear is clearing the whole desk.
Conclusion
Summary — What We Covered
Token costs in Claude Code compound fast — faster than in any chat interface — because agentic loops re-read your full history on every turn. But the waste is controllable.
Here's the shape of the problem and the shape of the solution:
- Context management (Tier 1) is the biggest single lever. What you feed Claude determines the token floor for your whole session.
- Model selection (Tier 2) is the second biggest lever. Default to Sonnet; use Opus where deep reasoning is genuinely required; use Opus Plan Mode for sessions that start with architecture and end with implementation.
- Prompt discipline (Tier 3) compounds across hundreds of daily prompts, especially in long sessions.
- Session hygiene (Tier 4) —
/compactat phase boundaries,/clearbetween unrelated tasks — prevents the silent compounding cost of long, unfocused sessions. - Tooling (Tier 5) locks in your gains and makes the good habits automatic.
Key Takeaways
- Control what Claude reads before worrying about how you write prompts
- Default to Sonnet; escalate to Opus only where deep reasoning is genuinely required
- Use
/model opus-planfor any session involving architectural decisions or large implementations /compactat every major phase boundary;/clearor new session when switching topics- Run a context generator at the start of every session instead of sharing raw files
- Write dense prompts for routine tasks; full sentences for production-touching changes
Next Steps
Start today: Install repomix and run it once in your current project.
npx repomix
See how the token count compares to what you'd have sent without it.
This week: Try one full session with /model opus-plan on a feature branch. Notice the difference in planning quality — and in how much of the session Sonnet handles instead of Opus.
Ongoing: Adopt the session-start checklist and /compact discipline as defaults. Add a .claudeignore to every project. Keep a handoff note. These become second nature quickly, and they quietly save tokens on every session from here on.
If this was useful, bookmark it for reference and share it with your team — token costs are a team problem, and a team-wide checklist is a team-wide saving.