The problem

Every AI call rereads the
full conversation history

Cost grows quadratically.

Query-agnostic compression keeps the wrong passages.

turn 1: [ctx]
turn 2: [ctx][ctx]
turn 3: [ctx][ctx][ctx]
    ⋮   O(n²)

The solution

Reads the question first.
Rewrites only what's relevant.

Reads the question

Rewrites only the relevant content — drops the rest.

1.5B · offline

Qwen2.5-1.5B + LoRA. Cents per call, no API.

Three-tier memory

Keeps multi-turn context flat — any conversation length.

The Token Company Compression Challenge · UC Berkeley AI Hackathon 2026

ReCompress

Read the question first. Rewrite only what matters.
8.5× fewer tokens — from a 1.5B model that runs offline.

Parth Sanjay Kshirsagar · Kartikey Pandey

The results
+56%
better than bear-2 (HotpotQA)
8.5×
fewer tokens
$10
total

Zero-shot to a benchmark we never trained on.

Multi-turn stays at 184 tokens — naive reaches 1,482 by turn 12.

Paper on Zenodo · DOI 10.5281/zenodo.20786357

ReCompress

github.com/Kart-ing/ReCompress

Thank you.

Paper: 10.5281/zenodo.20786357 · Demo: demo-eight-olive-97.vercel.app

Appendix

Deeper evidence — for live Q&A

benchmarks · cross-solver audit · honesty · crossover · how it's cheap · live demo

Results · across benchmarks

Significant where it's hardest

per-benchmark bars

HotpotQA ✓ 2Wiki ✓ MuSiQue n.s. SQuAD n.s.   — significant on multi-hop-with-distractors; honest about the rest.

We stress-tested our own headline

The win survives an independent judge

cross-solver

Teacher + solver were both DeepSeek. Re-scored with Claude Sonnet (independent): Δ +0.288 vs +0.285 in-family. Not a same-family artifact.

We audited ourselves

Much of the win is span-selection — measured

mask the answer

Redact the gold answer span and re-solve: our F1 drops 65% vs bear's 31%. Better selection, not better reasoning — and we report it.

Act 2 · multi-turn + a self-correction

Flat context — after cutting our own dead weight

crossover

Our LLM checkpoint-trigger was 98% useless; a free rule made it 4.2× cheaper than uncached naive by 20 turns. (Honest: never beats a KV-cached agent on raw tokens.)

How it's cheap · distillation

Three attempts: wash → overfit → win

distillation trajectory

v1 under-data → v2 overfit → v3 significant. All three reported. The 1.5B recovers ~64% of the frontier teacher's margin, offline.

Live demo

→ Switch to the interactive demo

cross-solver toggle · crossover slider · redact the answer live

demo-eight-olive-97.vercel.app

live site embedded — or open demo-eight-olive-97.vercel.app full-screen

Honest limitations

What we did not prove

  • Significant on 2 of 4 benchmarks — MuSiQue & SQuAD are directional (n=50).
  • Much of the F1 is the answer span itself (mask test) — selection, not reasoning.
  • Multi-turn beats uncached naive only; a KV-cached agent wins on raw tokens.
  • The student is autoregressive — ~1.8× bear's latency (for 9.4× fewer tokens).

Everything's open

Paper (Zenodo)
10.5281/zenodo.20786357
Demo
demo-eight-olive-97.vercel.app
Code
github.com/Kart-ing/ReCompress

ReCompress · The Token Company Compression Challenge · UC Berkeley AI Hackathon 2026