v4.3 · 97.9% on tsbench + ~20K tokens/week real

The only MCP server
that turns Claude into
a perfect coder.

v4.0 release. One MCP server, one profile. Set TOKEN_SAVIOR_PROFILE=optimized and you ship 97.9% accuracy on 96 real coding tasks. Plain Claude scores 78.3%. Active tokens drop 80% (3 395 vs 17 221). Wall time drops 83%. No tuning required.

Pick the two agents to compare

Aplain agent

Read · Grep · Bash

B+ token savior

+ 59 structural MCP tools

—

faster wall time

—

fewer active tokens

—

higher accuracy

—

losses · 96 tasks

the scorecard · what the numbers say

Same model. Same tasks.
One of them kept winning.

Two agents, one codebase, ninety prompts pulled from real engineering work — find callers, audit cycles, estimate blast radius, summarize modules. We counted tokens, wall time and correctness.

agent a · plain

Read, Grep, Bash.

no structural tools

active tokens—

wall time—

accuracy—

score—

—vs—

accuracy i

——i—

agent b · token savior

Plus the graph.

+ 59 structural MCP tools (lean profile)

active tokens—

wall time—

accuracy—

score—

Accuracy, by category

avg score out of 2 · red = plain · green = token savior

Where the gap opens

context ingested per category · lower is better

−99 %
Import cycle detection
Plain agent ran 35 Bash, 20 Read calls, 113k chars, 132s. Token Savior: two MCP calls, 200 chars, 17s. Same correct answer.
0 → 2
File dependency recovery
Baseline read the whole file and missed three imports. One get_file_dependencies() call got a perfect 2/2.
21
Impossible without
Tasks the plain agent scored 0 on but Token Savior solved — community detection, hotspot audits, semantic duplicates. Amber tiles below.

the graph · what Token Savior sees

A codebase isn't a folder.
It's a graph.

What you see below is the tsbench fixture parsed into its call graph: 206 symbols (functions, classes, methods, constants), 412 call edges connecting them, clustered by module. The plain agent walks this one grep at a time; Token Savior queries it directly. Drag to orbit, hover a point for details.

function class method const 206 symbols 412 call edges drag · scroll · hover

the replay · watch a task resolve

Don't take our word for it.
Watch both agents work.

Every tile is a real task from the benchmark, colored by outcome. Click one and we'll replay both agents' actual traces side by side — real tools, real timings, real answers.

Token Savior wins (—) impossible without TS (—) tie (—) losses (—)

TASK-026 Detect import cycles in the project. Any circular dependencies? If yes, list them.

wall time i

plain—

→

ts—

—

active tokens i

plain—

→

ts—

—

context chars i

plain—

→

ts—

—

score i

plain—

→

ts—

—

agent a · plain

Read · Grep · Bash

agent b · token savior

+ structural MCP

beyond the bench · real coding sessions

~20K tokens / week
saved on real sessions.

tsbench measures coding accuracy on a synthetic fixture. The new v4.1 / v4.2 / v4.3 layer measures something different: how many tokens leak out of actual tool outputs across a week of live work. Bash chatter, test runners, kubectl dumps, git logs. All of it sandboxed or compacted before it ever reaches the model context.

1 121

bash outputs scanned (7 d)

19.3%

match rate (was 11.9%)

68.9%

mean compaction on hit

~20 410

tokens saved over 7 d

What shipped since v4.0

three additive releases, zero prompt change

v4.1 : 14 Bash output compactors, a PreToolUse rewriter that rewrites bare commands into denser variants, and ts_discover to scan transcripts for missed TS chains.
v4.2 : 8 more compactors (jest, vitest, eslint, biome, kubectl, aws, npm/pip list, curl), hybrid sandbox + compact dual-mode, and a ts init CLI that wires the hooks into Claude / Cursor / Gemini / Codex.
v4.3 : 12 more compactors (grep, find, cat, git extras, gh extras, python3 -m pytest) and a compound command splitter that recognizes cd X && cmd.

Why it matters

free wins on top of the bench number

Projected ~85K tokens / month per active coder at current usage, at zero accuracy cost.
All gains arrive without any model-side change : no system prompt edit, no agent rewrite, no profile flip required.
The match rate went from 11.9% on v4.2 to 19.3% on v4.3 after one bench-driven coverage push : scripts/bench_compactors_real.py over your own transcripts to reproduce.
Compactors are pure functions, opt-in, fail-safe : unknown command shapes fall through to the existing sandbox path untouched.

git11

status, diff, log, push, pull, commit, add, fetch, checkout, branch, worktree list, stash list

gh6

run list, run view, pr view, pr diff, repo view, issue view

test runners4

pytest, cargo test, jest, vitest

build & lint4

cargo build, tsc, eslint, biome

shell utils3

grep, find, cat

cloud (aws + kubectl)9

kubectl get, kubectl logs, aws sts, ec2, lambda, logs, iam, dynamodb, s3

misc4

docker ps, docker logs, npm/pip list, curl

# 1. upgrade and wire the hooks in one shot pip install --upgrade token-savior-recall ts init --agent claude --yes # 2. flip the two opt-in flags export TS_BASH_COMPACT=1 export TS_BASH_REWRITE=1

reproduce the numbers on your own transcripts : scripts/bench_compactors_real.py

The only MCP server
that turns Claude into
a perfect coder.

Same model. Same tasks.
One of them kept winning.

Accuracy, by category

Where the gap opens

Every task.
Side by side.

A codebase isn't a folder.
It's a graph.

Don't take our word for it.
Watch both agents work.

One MCP server.
Every coding agent.

~20K tokens / week
saved on real sessions.

What shipped since v4.0

Why it matters

Stop guessing.
Give your agent the map.

The only MCP serverthat turns Claude intoa perfect coder.

Same model. Same tasks.One of them kept winning.

Accuracy, by category

Where the gap opens

Every task.Side by side.

A codebase isn't a folder.It's a graph.

Don't take our word for it.Watch both agents work.

One MCP server.Every coding agent.

~20K tokens / weeksaved on real sessions.

What shipped since v4.0

Why it matters

Stop guessing.Give your agent the map.

The only MCP server
that turns Claude into
a perfect coder.

Same model. Same tasks.
One of them kept winning.

Every task.
Side by side.

A codebase isn't a folder.
It's a graph.

Don't take our word for it.
Watch both agents work.

One MCP server.
Every coding agent.

~20K tokens / week
saved on real sessions.

Stop guessing.
Give your agent the map.