How to Save Tokens in Claude Code: Reduce Token Usage and Cut Costs for Every Session

*~12 minute read | Applies to Claude Code with Opus 4, Sonnet 4, and Haiku 4 | Last reviewed June 2026*

It's Sunday night. You're 40 minutes into a refactor session, things are finally clicking — and Claude Code hits a wall. Context limit. Session over.

Or maybe you didn't notice until you opened your API dashboard on Monday morning and found a bill that made you do a double-take.

Either way, you've felt the sting of runaway token usage. And the frustrating part? A lot of it was preventable.

This post is your fix. By the end, you'll have a clear, prioritised action list for slashing token consumption in Claude Code — whether you're on a Pro subscription or paying per API call. No vague advice, no made-up percentages, just practical strategies ranked by actual impact.

Who this is for: developers, indie hackers, and teams who use Claude Code every day and want to get more done without watching their token budget evaporate.

What Are Tokens and Why Do They Matter in Claude Code?

Before we talk about saving them, let's make sure we're on the same page about what tokens actually are.

Tokens Are Not Words — Here's What They Actually Are

A token is a small chunk of text. Not a word, not a character — somewhere in between. For English prose, roughly 3–4 characters make up one token. For code, it's a bit different.

Code tokenizes less efficiently than plain English because of all the symbols, indentation, punctuation, and repetition involved. That node_modules folder full of boilerplate? A token monster. A 200-line Python file sits somewhere around 500–800 tokens depending on complexity — and that's before Claude starts responding.

Quick mental model: think of a token like a Lego brick. Your prompt and files are the pile of bricks you hand Claude. Claude reads every brick, then builds something back. The more bricks you hand over, the longer everything takes — and the more it costs.

How Claude Code Consumes Tokens Differently From the Chat Interface

Claude Code isn't a chatbot. It's an agent. And that changes everything about how tokens get spent.

Every time Claude Code calls a tool (reads a file, runs a command, checks output), it re-sends the full conversation history as context. Again. And again. And again. This is called an agentic loop, and it's the main reason your token count in Claude Code climbs so much faster than in a regular chat.

Here's the key distinction:

Input tokens — everything Claude reads: your prompt, all the files it accesses, and the entire conversation so far
Output tokens — what Claude writes back

In a long Claude Code session, input tokens dominate the bill. Your prompts might feel short, but they're a drop in the bucket compared to the file reads and history being re-sent on every turn.

The other thing worth knowing: token count doesn't grow linearly. It compounds. A 30-turn session doesn't cost 30 × the average turn — it costs significantly more, because each new turn carries all the previous turns with it.

Agentic Loop Token Compounding (Approximate Context Size)

Turn  1 | █ 800 tokens (Prompt + initial files)
Turn  5 | ████ 3,800 tokens (Accumulated history)
Turn 10 | █████████ 8,500 tokens
Turn 15 | ████████████████ 14,000 tokens
Turn 20 | ████████████████████████ 22,000 tokens
Turn 30 | ███████████████████████████████████████ 38,000 tokens

*Notice the growth curve: Every new tool call re-sends the entire preceding history.*

What Drives Token Waste in a Typical Claude Code Session

Most token waste comes from a handful of habits:

Feeding Claude your whole codebase when it only needs two files
Piping noisy CLI output directly into context without filtering
Letting sessions run on too long and dragging irrelevant history into new tasks
Using a powerful (expensive) model for tasks any model could handle
Wordy, over-polite prompts that re-explain background information Claude doesn't need

Fix these, and you'll feel the difference immediately.

The Priority Stack: Where to Start

Before we go deep, here's the ranked overview. Return to this table whenever you want a quick reminder.

Tier	Category	Impact
1	Context management — what you feed Claude	Highest
2	Model selection — which Claude you use	High
3	Prompt discipline — how you write requests	Medium
4	Session hygiene — `/compact`, `/clear`, fresh starts	Medium
5	Tooling and automation — CLI tools, MCP servers	Compounding

💡 Good news: Even applying just Tier 1 and Tier 2 strategies makes a meaningful dent in usage for most sessions.

Work through the tiers in order. Don't skip to Tier 5 thinking clever automation will fix a context management problem.

Tier 1 — Control What Claude Reads (Context Management)

This is the single biggest lever you have. More important than your model choice. More important than how you phrase prompts. What you feed Claude determines the token floor for your entire session.

Stop Sending Your Whole Codebase

It's tempting. You want Claude to "just understand the project," so you point it at everything. This backfires almost every time.

Dumping a whole codebase into context means Claude is reading node_modules, generated files, lock files, old migrations, and whatever else is in there — none of which it needs for the task at hand. You're paying for all of it.

The fix is simple in principle: only give Claude the files relevant to the current task. State it explicitly. "Look at src/auth/login.ts and src/middleware/session.ts only." Claude doesn't need to discover the relevant files if you can tell it where they are.

Claude Session Cost Estimator (Compounding Context)

Files dumped into context: 15 files

Agent turns in session: 20 turns

Unmanaged Context Cost

$...

Optimized (Scoped) Cost

$...

Use a Structured Context Generator

If you want to be systematic about this, context generator tools do the hard work of building a lean, token-efficient project snapshot for you. Instead of a raw file dump, you hand Claude a curated summary of your project.

Three solid options:

repomix
Packs your repo into a single AI-friendly file. It automatically strips lockfiles, build output, and generated code, and it shows you a token count before you send anything. Great starting point for most projects.

npx repomix

code2prompt
A Rust CLI with prompt templating and source-tree output. Fast, flexible, good for teams who want more control over what gets included.

codesight
Universal context generator with a built-in MCP server, so it works across Claude Code, Cursor, Copilot, and Codex. Worth considering if you're switching between tools.

How to choose:

Tool	Best for
repomix	Quick setup, most projects
code2prompt	Performance-focused teams, Rust ecosystem
codesight	Multi-tool workflows, MCP integration needed

The workflow is the same for all three: run the tool once at the start of your session, hand Claude the output instead of pointing it at raw directories. That's it.

Filter and Compress CLI Output Before It Reaches Claude

Running grep -r "someFunction" . across a medium-sized codebase can return hundreds of matches with surrounding context lines duplicated over and over. If that output goes straight into your Claude Code session, you've just added a lot of tokens for very little value.

What to filter before sending CLI output to Claude:

Duplicate file paths
Repeated surrounding context lines (grep's -B and -A context flags are useful but noisy)
Long stack traces where only the first few lines matter
Verbose build logs — pass in the error, not the whole log

Some teams use tools like RTK to deduplicate shell output before it enters the context window. Even manual trimming makes a difference.

This matters most for: git diff on a large branch, grep across many files, long test suite output, and build logs.

Limit File Context to the Task at Hand

Two habits that compound well:

1. Explicit scope declarations in your first prompt. Start every task with something like: "For this task, only look at src/payments/. Don't read anything outside that directory." Claude respects scope instructions.

2. Use .claudeignore. Similar to .gitignore, this tells Claude Code which paths to leave alone. Add it to your project root and list anything you never want Claude to read by default.

Things that typically belong in .claudeignore:

node_modules/
dist/
build/
.git/
*.lock
*.min.js
coverage/
__pycache__/
.env

Set this up once and forget about it. It pays dividends on every future session.

Tier 2 — Pick the Right Model for Every Task

Model selection is the second biggest lever, and it's one most people get wrong in the same direction: they default to Opus for everything.

Understanding the Claude Model Tiers

Model	Strengths	Best for
Opus	Deep reasoning, complex analysis, understanding unfamiliar systems	Architecture decisions, complex debugging, large refactors
Sonnet	Fast, capable, great at code	Editing, implementation, agentic loops, most daily tasks
Haiku	Cheap, quick, solid for simple tasks	Renaming, formatting, quick lookups, repetitive sub-tasks

The cost difference between these tiers is significant. Using Opus for a task Sonnet handles just as well isn't just wasteful — it's usually slower too.

The Architect/Contractor Mental Model

Here's a way to think about it that makes the decision obvious:

Opus = the architect. You hire an architect for blueprints, structural decisions, and solving hard problems you can't figure out yourself. You don't hire them to paint the walls.
Sonnet = the contractor. Reliable, fast, cost-effective. Executes a clear plan well. Your everyday workhorse.
Haiku = the apprentice. Great at simple, well-defined tasks. Give them a clear brief and they'll get it done.

The most common expensive mistake: using Opus as the default for everything, including the painting.

Opus Plan Mode — Plan With Opus, Execute With Sonnet

Claude Code has a built-in way to get the best of both worlds: Opus Plan Mode.

/model opus-plan

What this does: routes your planning step to Opus, then hands execution off to Sonnet within the same session. You get Opus-quality thinking for the architecture decisions, and Sonnet efficiency for the actual implementation.

How to use it well:

Activate /model opus-plan
Describe your task fully — include the goal, constraints, and any relevant files
Read the plan before saying "go ahead" — this is important. Reviewing the plan before execution catches misunderstandings early, before they compound into wasted turns
Approve or adjust, then let Sonnet run

When Opus Plan Mode shines:

Large refactors
Building a new feature from scratch
Debugging a complex issue across multiple files
Working in an unfamiliar codebase

When it adds less value:

Quick one-off tasks where the plan is obvious
Highly exploratory, iterative work
Tasks that are clearly Sonnet-level from the start

The Anthropic Advisor Strategy

A more flexible version of the same idea: use Opus to reason about a problem and produce guidance, then hand that guidance to Sonnet to act on it.

This isn't limited to a single Claude Code session. It's a pattern you can apply across multi-model pipelines — Opus as the strategic brain, Sonnet as the hands. Anthropic and teams using this approach have reported meaningful cost reductions while maintaining or improving output quality. The exact savings depend heavily on your workflow, but the principle holds: if Sonnet can execute a plan it didn't make, there's no reason to pay Opus rates for execution.

Use this pattern when Opus Plan Mode feels too structured or you're orchestrating work across multiple steps or sessions.

Practical Model-Selection Decision Tree

Not sure which model to pick? Work through this:

Is the task primarily reasoning, architecture, or analysis?
  → Yes: Opus (or Opus Plan Mode)
  → No: continue

Is the task primarily implementation, editing, or following a clear plan?
  → Yes: Sonnet
  → No: continue

Is the task trivial, repetitive, or clearly defined?
  → Yes: Haiku
  → No: Default to Sonnet; escalate to Opus only if quality falls short

When in doubt, start with Sonnet. It handles more than people expect.

Tier 3 — Write Leaner Prompts

Prompt length isn't the biggest cost driver in a Claude Code session, but it's not nothing either — especially when you account for the fact that every prompt gets re-read on every subsequent turn.

Why Prompt Length Costs More Than You Think

The maths are simple: a wordy prompt style × 100 prompts per day adds up. More importantly, those words aren't just sent once. They're re-read on every turn for the rest of the session.

Your prompts are probably a smaller share of total input tokens than you think — file reads and conversation history usually dominate. But it's still a lever, and it costs nothing to pull.

The Dense-Prompt Technique

Strip the pleasantries and the backstory. Lead with the verb.

Verbose:

"Hey, I was hoping you could take a look at the authentication module I've been working on and maybe help me understand why the session tokens aren't refreshing correctly when the user comes back after a while? I think it might be something in the middleware but I'm not totally sure."

Dense:

"Debug: session tokens not refreshing on return visit. Check src/middleware/session.ts first."

Same request. Fraction of the tokens.

When dense prompts work well:

Quick lookups and checks
Code reviews with a clear focus
Simple, unambiguous tasks

When to write full sentences:

Production-touching changes where misinterpretation is costly
Complex instructions with multiple constraints
Anything where you need Claude to reason carefully about edge cases

Front-Load Constraints and Scope

Structure your prompts like this:

[Task] + [Scope] + [Constraints] + [Expected output format]

Example:

"Add input validation to src/api/createUser.ts. Only modify that file. Don't change the function signature. Return the modified function only, no explanation."

Stating what Claude shouldn't do is just as important as stating what it should. Open-ended prompts invite Claude to read more than necessary.

Batch Related Tasks Into One Planning Prompt

Instead of one request at a time — which generates a back-and-forth chain — describe a related set of changes in a single prompt.

Opus (or Sonnet) plans them as a coherent set. Execution happens in sequence. This keeps context growth linear rather than exponential, because you're not building a long question-and-answer history around tasks that should have been one conversation.

Tier 4 — Session Hygiene (Context Compaction and Fresh Starts)

Long sessions are a silent token drain. Here's how to keep them under control.

Why Long Sessions Are the Silent Token Killer

Every turn in a session re-reads the full conversation history. A 30-turn session doesn't cost 30 × the average turn — it costs substantially more, because each new turn carries the weight of everything before it.

There's also the context carry-over trap: you finish debugging a gnarly issue, and then instead of starting fresh, you immediately ask Claude to help with something completely unrelated. Now it's reading 40 turns of debugging context that has zero relevance to the new task. Wasted tokens, and often degraded output quality too.

How and When to Use /compact

/compact

/compact tells Claude to summarise the conversation so far, then continue with that summary in place of the full raw history. You lose some granular detail, but you keep the distilled decisions and context.

Best timing:

Right after an exploration or research phase, before starting implementation
After a debugging session where the cause is understood and you're moving to the fix
Whenever you notice the session has gotten long and you have a clear phase boundary ahead

What you keep: key decisions, current understanding, relevant context
What you lose: the exact wording of earlier turns, granular details that probably don't matter anymore

Make /compact a habit at every major phase boundary. It's one command and it pays for itself quickly.

When to Use /clear

/clear

/clear is the nuclear option — it removes all context. Use it when you're switching to a completely unrelated task and there's nothing in the current session worth keeping.

A good pairing: run your context generator (repomix, codesight) after /clear to give Claude a fresh, accurate project snapshot. You get a clean slate with just enough project context to get started.

When Starting a New Session Is the Right Move

Sometimes context isn't an asset — it's a liability. A long session full of decisions and explorations for Task A becomes noise when you move to Task Z.

Heuristic: if your next task shares fewer than roughly 30% of the relevant files with your current task, start fresh.

The obvious concern is losing context between sessions. Claude has no memory by default, so solve this with a session handoff note — a 3–5 line note in your project file that captures:

What was decided
What changed
What's next

Keep it updated as you go. It takes 30 seconds to write and saves you from re-explaining your whole project at the start of every new session.

Checking Token Usage Mid-Session

Claude Code shows you session token usage. Actually use it — especially before starting a large task.

If you're approaching the limit:

Decide proactively whether to /compact and continue, or start fresh
Don't wait until you hit the wall mid-implementation

Getting cut off halfway through a complex change is the worst outcome. A quick check before you kick off a large task costs nothing.

Tier 5 — Tools, Automation, and Advanced Patterns

Once you have the fundamentals solid, these patterns help you scale your efficiency further.

MCP Servers for Context-Efficient Tooling

MCP (Model Context Protocol) servers let Claude access external data and tools in a targeted, token-efficient way. Instead of pasting a whole Jira board into context, an MCP tool can fetch the one ticket Claude needs. Instead of sharing an entire database schema, an MCP query returns just the relevant tables.

The difference in practice:

Naive approach	MCP approach
Paste entire schema into context	Query specific tables on demand
Share full Jira board	Fetch single ticket by ID
Dump all docs into prompt	Retrieve specific doc section

If you're already using codesight, its built-in MCP server handles the context generation side of this. For external services, check what MCP servers are available for the tools you already use.

Building a Token-Efficient Project Workflow

Here's a session structure that keeps things lean from start to finish.

Session start:

Run your context generator (npx repomix or equivalent) to build a clean project snapshot
Filter any CLI output you plan to share before the session begins
Set your model — Sonnet for most work, /model opus-plan for architectural sessions
State your task scope and exclusions in the very first prompt

Mid-session habits:

/compact after each major phase (research → plan → build → review)
Check token usage before starting a large task
Start a new session when switching to unrelated work

End of session:
Write a 3–5 line handoff note. What was decided, what changed, what's next. Your future self will be grateful.

.claudeignore and Exclusion Patterns

Here's a starting template you can drop into any project. Adjust as needed.

Node/JavaScript projects:

node_modules/
dist/
build/
.next/
.nuxt/
coverage/
*.lock
*.min.js
*.min.css

Python projects:

__pycache__/
*.pyc
.venv/
venv/
dist/
build/
*.egg-info/
.pytest_cache/

Universal additions:

.git/
.env
.env.*
*.log
*.map

Set this once per project. It quietly saves tokens on every single session going forward.

What Not to Bother With

Some "optimisations" that sound appealing but aren't worth your time:

Custom tokenizer hacks — fragile, save pennies, break frequently with model updates
Prompt-shortening agents — add latency and often degrade output quality more than they save tokens
Switching providers mid-task to save cost — the context loss usually costs more than the token savings
Micro-optimising individual prompt length — useful up to a point, then diminishing returns fast

The 80/20 rule applies hard here. Tiers 1 and 2 give you most of the gains. Everything else is refinement.

Putting It All Together — A Day in the Life

Here's what an optimised Claude Code session looks like in practice, from first keystroke to shutdown.

Example: Starting a Feature Development Session

You're building a new user permissions system.

Opening moves:

Run npx repomix --include="src/auth/,src/users/" to build a lean context snapshot
Switch to /model opus-plan
First prompt: "Plan the implementation of a role-based permissions system. Scope: src/auth/ and src/users/ only. No UI changes. Goal: three permission levels — read, write, admin."

After the plan:

Review it before saying go. Does it match your intent? Any assumptions you want to correct? Fix them now, not after five turns of implementation.
Approve the plan. Sonnet takes over for execution.

Mid-session:

After the plan is approved and execution starts, /compact to drop the planning conversation
Sonnet makes targeted file reads only — you told it the scope, so it stays in src/auth/ and src/users/

Example: Debugging a Complex Issue Across an Unfamiliar Codebase

You've inherited a service that's behaving strangely and you don't know where to look.

When to reach for Opus: complex hypothesis generation, tracing an issue across many files, reasoning about unexpected interactions. This is exactly what Opus is for.

Handing off to Sonnet: once you understand the cause — "okay, the session middleware isn't clearing the cache on logout because of this race condition" — the fix is Sonnet territory. Clear bug, known location, known change. No need for Opus anymore.

After debugging: /compact if you're continuing to work, or start fresh if the next task is unrelated. Don't drag debugging context into a new feature build.

Example: Routine Maintenance and Small Fixes

Renaming a prop across a component library. Adding JSDoc comments. Fixing a typo in config values.

Default to Sonnet or Haiku. These tasks are well-defined, low-complexity, and don't require deep reasoning. Dense prompts, tight scope, no plan-mode overhead needed. Short sessions, low carry-over, done.

Frequently Asked Questions

What's the fastest single change I can make to cut my Claude Code token usage today?

Use a structured context generator — repomix or codesight — instead of letting Claude read raw directories. This reduces input tokens on every file-read operation for the rest of the session. Run npx repomix in your project right now and see what you get.

Does switching from Opus to Sonnet noticeably hurt code quality?

For execution tasks — writing code from a clear plan, applying changes, running commands — Sonnet produces comparable results. The quality difference shows up in complex reasoning tasks, which is exactly why Opus Plan Mode reserves Opus for planning only. You get the reasoning quality where it matters, at Sonnet rates for everything else.

How often should I use /compact?

At natural phase boundaries. After research or exploration. After architecture planning. After a debugging diagnosis. Don't use it so frequently that you lose important context; don't avoid it so long that the session becomes bloated. If you're asking yourself whether it's time, it probably is.

Will starting a new session make Claude forget my project?

Yes — Claude has no memory between sessions by default. Solve this two ways: (1) keep a brief handoff note in your project (recent decisions, current state, next steps), and (2) run a context generator at the start of each session to give Claude an accurate project snapshot quickly. Both together take under a minute.

Is Haiku worth using for real development tasks?

Yes, for well-defined low-complexity tasks. Renaming symbols, adding comments, formatting output, running quick lookups — Haiku handles all of these well and costs significantly less than Sonnet. The key is matching model capability to task complexity. Don't overthink it: if you'd trust a junior dev to do it in five minutes with clear instructions, Haiku can probably handle it.

Does prompt length really make a meaningful difference in total token cost?

On a single-prompt basis, the savings are modest. The impact compounds across a full workday of prompts — especially because your prompts are re-read on every subsequent turn in the same session. Dense prompts matter most in long sessions with many turns. For short sessions, it's a nice habit but not a game-changer.

What's the difference between /compact and /clear?

/compact — summarises the conversation and continues with the distilled version in place of raw history. Use when you want to keep going on the same task with a lighter load.
/clear — wipes everything. Use when switching to a completely unrelated task.

Think of /compact as decluttering your desk. /clear is clearing the whole desk.

Conclusion

Summary — What We Covered

Token costs in Claude Code compound fast — faster than in any chat interface — because agentic loops re-read your full history on every turn. But the waste is controllable.

Here's the shape of the problem and the shape of the solution:

Context management (Tier 1) is the biggest single lever. What you feed Claude determines the token floor for your whole session.
Model selection (Tier 2) is the second biggest lever. Default to Sonnet; use Opus where deep reasoning is genuinely required; use Opus Plan Mode for sessions that start with architecture and end with implementation.
Prompt discipline (Tier 3) compounds across hundreds of daily prompts, especially in long sessions.
Session hygiene (Tier 4) — /compact at phase boundaries, /clear between unrelated tasks — prevents the silent compounding cost of long, unfocused sessions.
Tooling (Tier 5) locks in your gains and makes the good habits automatic.

Key Takeaways

Control what Claude reads before worrying about how you write prompts
Default to Sonnet; escalate to Opus only where deep reasoning is genuinely required
Use /model opus-plan for any session involving architectural decisions or large implementations
/compact at every major phase boundary; /clear or new session when switching topics
Run a context generator at the start of every session instead of sharing raw files
Write dense prompts for routine tasks; full sentences for production-touching changes

Next Steps

Start today: Install repomix and run it once in your current project.

npx repomix

See how the token count compares to what you'd have sent without it.

This week: Try one full session with /model opus-plan on a feature branch. Notice the difference in planning quality — and in how much of the session Sonnet handles instead of Opus.

Ongoing: Adopt the session-start checklist and /compact discipline as defaults. Add a .claudeignore to every project. Keep a handoff note. These become second nature quickly, and they quietly save tokens on every session from here on.

If this was useful, bookmark it for reference and share it with your team — token costs are a team problem, and a team-wide checklist is a team-wide saving.

*Last reviewed: June 2026 | Applies to: Claude Code with Opus 4, Sonnet 4, and Haiku 4 model families*