Jun 13, 2026

TokenCode burns 1,000 agents in parallel — here is how it does it in Go

TokenCode describes itself as a “token burning machine.” The idea is simple: if tokens are cheap and quality is scarce, stop betting everything on one run. Send the same task to a lot of agents and pick the best answer.

That could easily turn into a mess. In practice, the code is more disciplined than the headline suggests. The repo does support races of up to 1,000 agents, but the implementation is careful about how many are active at once, where they write, and how the winner gets chosen.

The single-agent loop comes first

The foundation is still a normal coding agent. internal/agent/agent.go owns the tool-use loop, internal/agent/prompt.go builds prompts, and internal/agent/event.go handles the event stream around each turn.

That matters because TokenCode does not have a special “multi-agent brain.” It has one agent implementation, then builds higher-level orchestration around many copies of that unit. That is a good Go design. You keep the core execution path small, then compose it.

The LLM layer is also more concrete than the original pitch might imply. Internally, the project models requests around Anthropic-style messages, and internal/llm/anthropic.go defaults to DeepSeek’s Anthropic-compatible endpoint. The repo also includes internal/llm/openai.go and internal/llm/google.go, so the system is broader than a single provider, but the default path is explicit rather than magical.

`/race` is bounded concurrency, not 1,000 hot goroutines at once

The real parallel orchestration lives in internal/race/race.go. This file sets MaxN to 1000, which is where the headline number comes from, but it also sets a default concurrency window of 8. That is the important detail.

In other words, TokenCode can manage a race with up to 1,000 candidate agents, but it does not blindly light up 1,000 active workers by default. It fans work out through a semaphore-limited window, advances racers as slots free up, and propagates cancellation through context.Context.

That is a much more believable design than “spawn everything and hope.” You get the search breadth without pretending your laptop, model provider, and file system all want unbounded parallelism.

The shape looks roughly like this:

sem := make(chan struct{}, concurrency)
var wg sync.WaitGroup

for i := 0; i < n; i++ {
	wg.Add(1)
	go func(i int) {
		defer wg.Done()

		select {
		case sem <- struct{}{}:
			defer func() { <-sem }()
		case <-ctx.Done():
			return
		}

		runOneRacer(ctx, i)
	}(i)
}

wg.Wait()

That is the part worth stealing. A lot of “parallel AI” demos talk like infinite fan-out is free. In production code you nearly always want a hard ceiling.

Git worktrees are the isolation boundary

The cleanest idea in the repo is the one that makes the whole race viable: each racer gets its own git worktree.

That logic sits in internal/race/worktree.go, not in the user-facing cmd/tokencode/worktree.go helper for the separate -w flag. For /race, TokenCode creates a temporary worktree and branch per racer, collects the diff, and deletes the losers. The winner’s branch stays around for traceability.

This is a better design than trying to synchronize file writes inside one shared checkout. If 20 agents all edit the same repository tree directly, you spend your time managing collisions. If each agent writes in its own isolated worktree, the concurrency problem becomes a comparison and merge problem, which git is already good at.

The repo even leans into that explicitly:

each racer gets a branch like tokencode/race-<runID>-<i>
the worktree lives in a temporary directory, not inside the main repo
untracked files are still captured by forcing them into the diff
losers get cleaned up automatically, while the winner can be applied back to the main workspace

That is the kind of boring infrastructure decision that makes flashy features possible.

Judging is a pipeline, not just “first answer wins”

Another useful detail in internal/race/race.go and internal/race/judge.go: TokenCode does not simply return whichever agent finishes first.

The flow is closer to:

run racers in isolated worktrees
throw out empty diffs
optionally run an objective check command in each worktree
score surviving candidates
run a final comparison round among the top results

That is a better fit for coding work. Speed alone is a poor proxy for quality. A fast wrong patch is still wrong.

Compaction and checkpoints solve different problems

The generated draft blurred a few support features together, so this part is worth separating.

internal/agent/compact.go handles long conversation history. It compresses old turns into a structured summary so one agent can keep working without dragging an ever-growing transcript around forever.

internal/checkpoint/checkpoint.go is about file safety, not agent memory. It snapshots files before write and edit operations so /rewind can restore them later. That is useful, but it is a different concern from the race engine.

Those distinctions matter because they show the repo is built from small, specific mechanisms. Worktree isolation handles concurrent writes. Compaction handles long context. Checkpoints handle local recovery. Each tool has one job.

Headless runs, HTTP serving, and A2A are layered on top

The repo also exposes the same agent core through several entry points.

cmd/tokencode/headless.go covers one-shot, non-TUI execution. cmd/tokencode/serve.go exposes HTTP serving. internal/a2a/a2a.go implements a minimal A2A server so TokenCode can be discovered and invoked by other agents.

What I like here is the reuse. The multi-surface story is not built from three separate agent implementations. The repo keeps feeding the same underlying machinery through different shells.

What is worth stealing

If you are building agent tooling in Go, a few ideas here transfer well:

keep the core agent loop separate from the orchestration layer
cap concurrency with a window even if the theoretical fan-out is huge
isolate filesystem writes with worktrees instead of inventing your own locking scheme
judge outputs with objective checks before asking a model to pick winners
treat context compaction, file recovery, and parallel execution as separate subsystems

That is the real lesson from TokenCode. The headline is “1,000 agents.” The engineering lesson is that the repo earns that headline by being conservative in the right places.