Code review for agents

A warden for
every diff.

diffwarden is a small CLI your agent calls. It reviews a diff — uncommitted changes, a branch, a single commit — and hands back a live terminal review for humans, plain text for agents, and stable JSON or NDJSON to gate on. Many reviewers, one artifact, read-only by default.

Install from npm View npm package View source

no writes, no comments — by default
10 reviewer engines — 2 flagship · 8 experimental
stable JSON / NDJSON

any agent calls diffwarden — it fans out to the reviewers you choose, in parallel.

Pipeline

One command in.
A reconciled review out.

The CLI owns every step between your agent's call and the result. Adapters only run their engine and hand back text or structured output.

01

Resolve a target

Point diffwarden at uncommitted changes, a base branch, a commit, or custom instructions. It collects the diff and the changed-line ranges.
02

Fan out to reviewers

A reviewer is an engine held to diffwarden's contract. Your selected reviewers run concurrently behind adapters — each read-only, each running one shared rubric. Repeatable --focus flags add scoped lanes over the same diff.
03

Reconcile findings

Results are parsed, schema-validated, checked against changed lines, then deduplicated and attributed across reviewers.
04

Deliver & gate

Render the human review or agent summary, emit JSON or NDJSON, optionally fail CI on severity, and append a history report — all from the final artifact.

Loop

Your agent runs it
until the diff comes back clean.

diffwarden reviews once and exits — read-only, stateless, no fixing. The agent reads the artifact, fixes what it judges valid, and runs it again until the gate clears.

01 review → artifact 02 fix agent re-runs 03 gate → exit code ✕ ✕ ✓

0 findings ≥ P2

give it to your agent

The loop is a prompt. The agent calls diffwarden, reads the artifact, fixes, and re-runs — these two drive exactly that:

▸ diffwarden-changes.md the loop, run on your working diff
▸ complete-linear-issues.md the same loop, built into Linear issue-tracker work

/goal implement, then use diffwarden
until no valid findings remain

See the prompts in the repo

the agent owns the loop. diffwarden owns the review and the contract.

Lineage

Built on Codex's review pipeline.

diffwarden's review carries a Codex-derived review rubric — the same semantics, not the same text. The designs fork at the review child — the isolated sub-agent Codex spawns to run a review.

codex /review the shared spine

4 review targets

rubric semantics

parse-ladder fallbacks

read-only intent

the review child the designs fork

one review child · runs git itself

core-owned diff

fan-out → N

schema + location validation

dedup + attribution

read-only · graded

diffwarden · one artifact

Codex runs one review child; diffwarden forks at that same child and fans the review out across reviewers.

Reviewers

Pick your engines.
Mix them freely.

Every engine runs the same rubric, parsing, and output contract. Where they differ is how hard each is held read-only — and the rung an engine sits on is exactly that.

Full capability matrix

enforced native read-only / sandbox / spec mode

codex app-server · cli flagship

droid sdk · cli

grok cli

tool-restricted limited to read-oriented tools

claude ^* sdk · cli flagship

pi sdk · cli

gemini ^† cli

copilot sdk · cli

antigravity cli

prompt-only asked to stay read-only; not yet hard-proven

cursor sdk · cli

opencode cli

+ fake — built-in reviewer for credential-free dev

flagship = fully supported and live-tested — the recommended defaults. The other eight are experimental: functional, best-effort.

* from Jun 15, 2026, running Claude headlessly (claude -p / Agent SDK) draws from extra usage, not included plan usage

† from Jun 18, 2026, Gemini stays supported for enterprise and paid API-key users only — Google's consumer Gemini CLI moved to Antigravity CLI

Trialing

Audition new models on real diffs. Change nothing.

One run can mix engines, transports, and per-reviewer models — billed across the logins and keys you already have, with a review surface that's read-only by default.

› --reviewer claude^* --reviewer pi:openrouter-high

fan one diff across several engines in a single run

› diffwarden doctor · --report

preflight a reviewer; keep a durable record of how it performs

* Claude's auto auth prefers a logged-in Claude Code account and strips API credentials from the spawned reviewer process

the engines, and every model they can reach ↓

Reach

Ten engines.
The rest of the model space.

Two of the ten — Pi and OpenCode — open onto the rest of the model space, behind the same one contract.

2 flagship, live-tested (claude · codex) · 8 experimental

diffwarden

Codex
Claude
Gemini
Copilot
Pi
OpenCode
Cursor
Droid
Grok
Antigravity

OpenRouter · any base URL

OpenAI
Google
Anthropic
Meta
DeepSeek
Qwen
Mistral
Kimi
MiniMax
GLM
NVIDIA

any OpenAI-compatible endpoint →

direct through the lens labs you audition

Capabilities

Every flag is a capability.
Every run is one artifact.

No feature tour — just the flags your agent already passes. Hover, focus, or tap one to see what it buys you.

diffwarden

$ diffwarden review \

--target base:main · review any target uncommitted · base:<branch> · commit:<sha> · custom:<text>

--reviewer-set ci · many reviewers, one artifact run several engines at once; name a set or compose engine:profile

--focus "state management" · focused review lanes repeatable — fans one resolved diff into scoped parallel lanes plus an overview lane (--no-overview to skip); one merged ReviewBatchArtifact

--ndjson · contracts your agent can parse --json one artifact · --ndjson event stream · --agent plain text · no flag: human display

--debug-reviewer-output · see what each engine actually did opt-in bounded raw reviewer transcripts in artifacts (256 KiB per stream per reviewer); with --ndjson, live reviewer_debug_output events — never affect results

--fail-on-findings P2 · gate your pipeline exit non-zero while a finding sits at or above your threshold

# read-only by default · write + publish paths out of scope read-oriented tools only; how hard it's held varies by engine

its stdout — readable for you, parseable for your agent ↓

$ diffwarden review --target base:main --reviewer-set ci --fail-on-findings P2 # human display--agent--json--ndjson

diffwarden review

Target: base:main

Reviewers: claude, codex, pi

✓ claude finished (48.2s)

✓ codex finished (61.0s)

✓ pi finished (52.7s)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✗ CHANGES REQUESTED2 of 3 reviewers flagged1 P1 · 1 P2 · 2 P3 · 64s

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

▌ P1 Non-constant-time token compare

src/auth/session.ts:42 · pi · confidence 0.87

▌ P2 Unvalidated `limit` reaches the query

src/api/routes.ts:88 · codex · confidence 0.74

review finished in 64.3s · slowest codex 61.0s

$ echo $?

# --fail-on-findings P2 · 2 findings ≥ P2 → exit 1

Diffwarden Review

Target: base:main

Verdict: patch is incorrect

Confidence: 0.81

Findings: 4 (P1 1, P2 1, P3 2)

Reviewers: claude, codex, pi

Reviewer status: 3 passed, 0 failed

Findings:

1. P1 Non-constant-time token compare

File: /repo/src/auth/session.ts:42-42

2. P2 Unvalidated `limit` reaches the query

File: /repo/src/api/routes.ts:88-88

…

$ echo $?

# --fail-on-findings P2 · 2 findings ≥ P2 → exit 1

{

"schema_version": 2,

"target": { "kind": "base", "base_ref": "main" },

"result": {

"overall_correctness": "patch is incorrect",

"findings": [

{ "priority": 1, "title": "Non-constant-time token compare",

"reviewer_ids": ["pi"], "code_location": {…} }, …

]

}, …

}

$ echo $?

# --fail-on-findings P2 · 2 findings ≥ P2 → exit 1

{"schema_version":2,"type":"run_started","target":{…},"reviewers":[…]}

{"type":"preflight_finished","reviewer_id":"pi","ok":true}

{"type":"reviewer_result","provisional":true,"artifact":{…}}

{"type":"final_result","artifact":{…}}

# exactly one terminal frame: final_result or error

$ echo $?

# --fail-on-findings P2 · 2 findings ≥ P2 → exit 1

diffwarden review default · live verdict banner + honest consensus — reviewer lanes on stderr, a clean report on stdout.

--agent for coding agents · plain text final summary — no ANSI, spinners, or framing to parse around.

--json stable contract · one ReviewArtifact object — the authoritative, machine-stable result.

--ndjson event stream · versioned review events as the run progresses — built for agents + CI.

stable contract: schema_version 2 · fields only added within a version · breaking bumps called out in release notes

Sync the policy.
Keep the machine bits local.

Layer a host-owned file over your portable user config. Reviewer sets and review policy stay in dotfiles; machine IDs and per-host enabled toggles stay on the host that owns them.

$ diffwarden reviewers edit droid \\

--disabled --local

$ diffwarden doctor

Read the overlay contract

syncable base dotfiles

diffwarden.config.json

reviewers · sets · readonly · reporting

host-owned overlay never sync

diffwarden.config.local.json

machine IDs · local enabled toggles

effective config on this host

merged before validation · local values take precedence

No overlay? Nothing changes. A project config still wins wholesale.

Quickstart

Install from npm. Review from any repo.

Diffwarden is a published CLI package. Run guided setup straight from npx — or install it globally — then point it at an authenticated reviewer for real gates.

Requires Node ≥ 22.19
diffwarden@0.5.0 is the latest npm package
v0.5.0 stays available for source release details
diffwarden init runs a guided discover-and-scaffold setup in your terminal — with live per-engine model catalogs and recommended picks — then writes your config
diffwarden reviewers discover is a read-only, zero-spend probe: every engine comes back Ready, Needs attention, or Not installed

real run — ~50s transcript · share /tour

guided setup, no install

$ npx --yes diffwarden@latest init
$ npx --yes diffwarden@latest review --target base:main

permanent install from npm

$ npm install --global diffwarden
$ diffwarden --version

run a real reviewer

$ diffwarden doctor --reviewer pi
$ diffwarden review \
  --target base:main \
  --reviewer pi

give it to your agent

$ npx skills add aurokin/diffwarden \
  --global --skill diffwarden \
  --agent codex claude-code --full-depth

the same run, as a merge gate · .github/workflows

- name: diffwarden review
  run: |
    diffwarden review \
      --target base:${{ github.base_ref }} \
      --reviewer-set ci \
      --json \
      --fail-on-findings P2
# exit 1 blocks the merge

Put a warden on your diffs.

One command your agent calls. Many reviewers, one artifact, read-only by default.

Install from npm

A warden for every diff.

One command in. A reconciled review out.

Resolve a target

Fan out to reviewers

Reconcile findings

Deliver & gate

Your agent runs it until the diff comes back clean.

Built on Codex's review pipeline.

Pick your engines. Mix them freely.

Audition new models on real diffs. Change nothing.

Ten engines. The rest of the model space.

Every flag is a capability. Every run is one artifact.

Sync the policy.Keep the machine bits local.

Install from npm. Review from any repo.

Put a warden on your diffs.

A warden for
every diff.

One command in.
A reconciled review out.

Your agent runs it
until the diff comes back clean.

Pick your engines.
Mix them freely.

Ten engines.
The rest of the model space.

Every flag is a capability.
Every run is one artifact.

Sync the policy.
Keep the machine bits local.