Token Economics
Token usage, cost breakdown, and model roles across the build.
GPT-5.3 Codex
BuilderWrites all application code via Codex CLI in --full-auto mode. Sandboxed to workspace writes only. One feature per invocation.
codex exec --full-auto
7 min per feature
~$0.01/1K tokens
Code + tests + build
Claude Sonnet 4
EvaluatorScores each feature on 3 dimensions via API call. Returns structured JSON with pass/fail and specific feedback per dimension.
Complete / Visual / No Placeholders
40% / 30% / 30%
~$0.003/1K tokens
Score + feedback JSON
Why separate models? The builder (Codex) optimizes for completing the feature. The evaluator (Sonnet) optimizes for finding flaws. Same model doing both leads to “confidently praising its own mediocre work.” Different model, fresh context = honest scoring.
Token Split
Cost per Feature
Shows whether the agent becomes more efficient over time.