We don't lack loops. We have a fleet orchestrator, health scripts, a full editability seam, even a dashboard — and most of it drifted: dormant, legacy-scoped, unwired, or enforced by prose instead of code. The project's own record puts the cost at 2–4 sessions wasted per drift incident, 5+ weeks in the worst case. So this isn't a build list. It's a reckoning: subtract what rotted, keep the few that strengthen with the model, wire the durable into live and structural instruments, and install the one discipline that stops the rot.
The project has named the same failure at least eight times, across three unrelated eras. That recurrence is the tell — it's structural. Every instance is one shape: a mechanism is built correctly at one place, then drifts, and nothing catches the drift because the catcher is missing or is a sentence in a doc instead of a check in code.
make gate and the door's WALK both peeled like onions.You said it plainly: the old loops were built while you were learning, on an older Opus — be skeptical. That deserves to be a rule, because "it exists" and "it's worth keeping" are different claims, and the audit only proved the first. The rule is the rule callout above: does it get more valuable, or less, as the model strengthens? A weaker model needs rails — rigid rule-chains, ten tiny steps, elaborate harnesses. A stronger one just does what the rails were for. So the tell isn't age; it's direction. Keep the floors, cut the scaffolding.
Verdict: the need is durable, the mechanism is weaker-era → rebuild lighter. Unblocks every other loop.
The door re-runs a 15-minute capture every time you touch anything downstream — that's the tax that made tonight so slow. But the project already measured the truth: one full verification is ~2.8 seconds of process-spawn overhead wrapping ~50–80 milliseconds of real checking — a 35–56× tax that is pure scaffolding, from an era when spawning a fresh process per check felt safe. The move: capture once, verify in a tight loop tiered to what you changed — verify-a-deploy in ~5 min (works today, I ran it by hand tonight), re-run-from-cached-capture in ~7–9 min (one guard + a ~50-line mode away), the edit-check as a resident ~50–80 ms service.
--work-root + --allow-archived-capture) skips CAPTURE+MINT but is gated shut on a stamped moment — the guard is correct; the clean fix threads the existing momentId through so the binding check still means something. EDIT-SMOKE has no standalone entrypoint (it's why it has never run). And the spec justified "never reuse" on a 90-second capture assumption — reality is 15 minutes, 100× stale.Verdict: doesn't exist; it's the frontier, and it's model-native → build. The single sharpest gap in the corpus.
Today a prose/richtext edit ships on byte-exactness alone — no machine check for semantic fidelity (out of scope) and, as of session-120, no human gate either (it was removed). So Fides editing a bio on a customer's behalf has nothing verifying the edit is accurate or on-brand — the exact scenario "trust by construction" exists to prevent. The move: an anchored judge — the [J] court your constitution demands, an opus judge against a fixture it must fail on. The machine keeps proving the unedited bytes are preserved (durable floor, keep it); the judge renders a defeasible verdict on the edited span. This is exactly the judgment a weaker model couldn't be trusted with and a stronger one can.
Verdict: the pattern (loop + dashboard) is durable, the implementations are dead-era → retire the legacy, rebuild the pattern on the IL door. Unblocks Scale.
The door is one-site by construction; the only multi-site tooling is being retired as legacy (it scored the demoted output-layer). No aggregate faithfulness surface exists — the project's own posture is "424 green-by-absence": green because untested, not proven. A compiler-wide change today "ships and prays." The move: wire the proven one-site door into a batch runner + a live faithfulness registry (per-site cert status, preservation-score distribution, version stamp, drift-trend alarm). Reuse the shape of the dormant dashboard — state on disk, not context — but point it at IL truth, not the output-layer scores it was built for.
Verdict: durable need, actively being built → finish it. The least "new infrastructure" of all.
This is the thread we've pulled all night. The last-mile landed with mutation-proven guardrails; the first-mile (URL → capture → mint → editable, no hand-bridging) is proven on two of four DoD legs — the held-out live legs are exactly what tonight's door runs exercise. The phantom fix and empty-body fix moved it forward; the WALK live-origin leaks are the current frontier. This loop needs finishing, not designing.
Verdict: the meta-floor — it gets more valuable as we build more → install it.
Everything above will drift too, unless what keeps it live is structural, not social. This is your garden concept aimed at the compiler: every new loop ships with (a) a hook or lint that enforces it in code, not a doc sentence that drifts under pressure; (b) no halt-on-first — accumulate the full failure surface, report it at once; (c) no orphaned instrument — every check wired to a gate that reads it; (d) no label without its proof. These four are the antidotes to the four faces of the disease, baked into the build sequence so the next capability inherits immunity. It folds in tonight's CI/CD reckoning too: reconcile make gate down to one-instrument-per-fact, green-means-green, retire the dead checks. A gate you route around is already dead.
The red-pen targets — the design turns on these. Tap the ✎ on any block, or just tell me in chat.
"More vs. less valuable as the model strengthens" is the axis I'll sort everything by. If you'd cut it differently, this is the place — everything downstream inherits it.
Today a Breadth edit ships on byte-exactness with no fidelity check, human or machine. Three options: (a) build the anchored judge, (b) put the human back in the seat you removed at session-120, (c) both — judge screens, human ratifies what it flags. The sharpest customer-facing risk in the system.
My instinct: the fast-verify loop first, because it makes every other loop cheap to build and prove. But the edit-judge is the sharpest risk. Which leads?
Retire the dormant/legacy infra outright (the output-layer dashboard, legacy fleet scripts, dead make-gate checks), or keep them parked as reference? I lean ruthless — parked dead code is what drifts back in.
These are loops I'd build and run. Some (the edit-judge, the gate reconciliation) are norm/architecture-level and want your ratification before I cut. Others (fast-verify, fleet wiring) I can just build once you point me. Tell me the leash.