Architecture before enforcement

00

A list without an instrument.

Bengio, Hinton and Yao drew the first red lines at the Beijing dialogues in March 2024; four have since stuck: autonomous replication, weapons-of-mass-destruction uplift, large-scale cyberattack, loss of meaningful human control. By September 2025 a version of the list stood before the UN General Assembly as the Global Call for AI Red Lines, three hundred signatories and a deadline of end-2026 attached. The demanding has gone well.

Companies fulfil roughly half of their voluntary safety commitments, and the thresholds they set themselves for the same threat disagree about who the threat actor even is; nobody outside the developers can yet verify any of it. The institutions that could verify it take years to build, and each leans on the others: a verdict rests on a comparable score, the score on an agreed line.

01

Who would do the measuring?

The Seoul summit formalised the candidate in May 2024: a network of national AI institutes, renamed in late 2025 the International Network for Advanced AI Measurement, Evaluation and Science, and coordinated since December 2025 by the UK. Its lead institutes test frontier models before release; the access runs through voluntary MoUs with the three largest labs, and nothing else publicly backed has it. Membership runs overwhelmingly OECD or OECD-adjacent, and for an eighteen-month-old the output has been prolific.

0.778pass@200

UK AISI ran the attack, OpenAI published the result: a universal jailbreak on GPT-5.

5→60%

RepliBench autonomous-replication component-task success: below 5% in early 2023, above 60% by summer 2025.

9jurisdictions

Reached by three joint testing exercises since late 2024.

The Candidate

One network, mandated powers, uneven reach.

The paper’s nominee to verify AI red lines is the International Network for Advanced AI Measurement, Evaluation and Science, the Network below. Its members hold state-level mandates and, through agreements with the developers themselves, access to frontier models before release; the authors find that pairing nowhere else. The reach is uneven so far.

Filter by mandate

Show concentration

Showing all 11 jurisdictions mapped.

01

Highest capacity

United Kingdom: AI Security Institute
European Union: EU AI Office
United States: Center for AI Standards and Innovation (CAISI)

02

Operational mid-capacity

Japan: Japan AI Safety Institute
South Korea: AI Safety Institute
Singapore: Singapore AI Safety Institute
Canada: Canadian AI Safety Institute

03

Establishing · coordination · signatory

Australia: Australian AI Safety Institute
India: IndiaAI Safety Institute
France: INESIA
Kenya: No formal institute

Step 01 · the whole field

Eleven jurisdictions, one network.

The eleven tiles the brief puts forward, grouped by capacity, coloured by mandate; the apparatus beneath runs from staffed institutes down to a single diplomatic envoy.

Step 02 · open any institute

Budget, staff, access, shipped tools.

The UK’s dossier: 100-plus technical staff, roughly $83M a year, MoUs with the three largest frontier labs, the Inspect / ControlArena / RepliBench / InspectCyber toolchain; the Network Coordinator role has sat here since December 2025. Every tile works the same way.

Step 03 · the concentration point

By resources, largely one institute.

Switch from the map to the resource view. On a common budget axis the UK bar dwarfs every other member. The authors take the concentration the optimistic way: one state has already shown the capacity can be built.

Step 04 · the full network, in view

Capacity without reach: the work ahead.

Back to the full map. The capacity is real, its reach uneven: one binding enforcer (the EU), two Global South footholds (Kenya a founding member with no institute of its own; India the reverse), no Chinese member. The count below tallies it.

Over to you

The map is yours.

A tile opens into the full dossier: budget, staff, access, shipped tooling. The concentration toggle ranks the field by resources; the mandate filters cut it down.

member with binding enforcement power: the EU AI Office. Every other institute can test and advise; none can compel.

footholds in the Global South: membership stays overwhelmingly OECD or OECD-adjacent, with India running an institute (though not yet a formal member) and Kenya participating as a signatory, without one of its own.

Chinese participation: China sits outside the Network entirely, though its own newly formed safety association leaves a closer relationship open.

What the map argues

A standard-setter needs both capacity and standing. The Network has capacity, concentrated in a handful of OECD states where its legitimacy also rests; that is the gap the FATF spent three decades and nine regional bodies closing, and enforcement across jurisdictions waits on it.

The opt-out underneath

The pre-deployment access this all rests on runs through voluntary MoUs outside the EU AI Act, and a lab that withdrew tomorrow would face no formal consequence. The FATF never faced this, because the banks it policed could not opt out of being regulated; the Network’s subjects can, and the closer its findings come to binding, the more reason they have to walk.

Figures as of the paper’s writing (2026), from its institutional mapping and Annex II; budgets approximate, some budgets and staffing undisclosed. Until late 2025 the Network was the International Network of AI Safety Institutes.

02

What the watchdog actually built.

The Financial Action Task Force began in 1989 as sixteen states and a temporary mandate; the secretariat was borrowed from the OECD to spare the cost of building one. Its anti-money-laundering standards now run in more than 200 jurisdictions, and there is still no treaty behind them.

The first decade contained no enforcement mechanism at all.

Two Clocks

Eighteen months, or thirty-five years?

The paper reads the Network against the Financial Action Task Force, a soft-law body that bound the world to anti‑money‑laundering rules without a treaty. Line up the two clocks and the Network’s position is plain.

The FATF’s thirty-five-year arc, in six stages

1989FoundationA G7 initiative; the Forty Recommendations inside its first year, nothing yet to enforce.
1992 onwardStandardsEvaluation capacity, no enforcement. The first decade went to procedure.
June 2000HereThe NCCT blacklist: fifteen jurisdictions named to trigger consequences the FATF had no authority to impose. Eleven years in. This is the phase the Network has now reached.
2006CollapseThe blacklist is discontinued, within six years of launch: premature enforcement delegitimised itself.
2007 onwardRecoveryThe ICRG rebuilds enforcement on quantified thresholds applied regardless of membership, and it held.
TodayMaturityNear-universal reach, with the standing to impose consequences earned over three decades of building first.

The International Network, formalised 2024 and roughly eighteen months building, has reached Stage 3: the same developmental phase the FATF stood at in June 2000, before its blacklist. Stages 4 through 6 remain ahead of it.

Financial Action Task Force

Founded 1989 · ~35 years to maturity

International Network

Formalised 2024 · ~18 months building

FATF

Network

Step through six aligned stages with the buttons, the dots, or the ← → arrow keys.

Step 01 · the FATF begins

One clock starts in 1989.

The Financial Action Task Force opens as a G7 initiative: sixteen founding states, the Forty Recommendations inside its first year, nothing yet to enforce.

Step 02 · the decades accrue

A decade of procedure before any enforcement.

Mutual evaluations, then a Secretariat borrowed from the OECD, then working groups, each in place before the FATF could punish anyone. The two rails run the same length and cover nothing like the same time: what took the FATF years has taken the Network months.

Step 03 · the Network reaches here, fast

Where the FATF stood before its blacklist.

On the FATF’s clock this is June 2000, eleven years in: the NCCT blacklist, fifteen jurisdictions named to trigger consequences it had no authority to impose. The Network has reached the same phase in eighteen months. The two clocks converge.

Step 04 · the blacklist collapses

The members had exempted themselves.

Switzerland and Luxembourg, tax havens and FATF members both, were exempted from scrutiny. The selection read as politically convenient, and the legitimacy deficit did the rest: the FATF discontinued the list in 2006.

Step 05 · the second time, it held

Rebuilt rule-bound and member-blind.

The 2007 ICRG rebuilt enforcement on quantified thresholds applied regardless of membership, foundations first and consequence second, and it held. The order the paper urges.

Step 06 · the mature regime

Near-universal, three decades on.

Today the FATF’s standards are adopted by more than 200 jurisdictions and the political support is still there. The Network has pulled level with the FATF that reached for its blacklist; the collapse, on the paper’s reading, is the one stage it does not have to repeat.

The lesson

The FATF never signed a treaty; the climb took thirty-five years, and the one shortcut it tried lasted six. From 2007, a jurisdiction lands in the ICRG process when its mutual-evaluation results cross set thresholds, and listing waits on a structured observation period and a negotiated action plan. The criteria are the same whether the jurisdiction holds FATF membership or not. The second attempt is still in force.

03

Compliance without effect.

There is a sting in the record. Compliance with the FATF’s standards has climbed for three decades; evidence that money laundering itself declined has not turned up.

Score the regime on its own terms.

#my-myth .my-fig, #my-myth .my-rstat, #my-myth .my-empty, #my-myth .my-flat-wrap, #my-myth .my-verdict, #my-myth .my-gap { opacity: 1 !important; transform: none !important; } #my-myth .my-fig::before { width: 22px !important; } #my-myth .my-col--left { background: linear-gradient(180deg, var(--paper) 0%, var(--tint-3) 100%) !important; box-shadow: inset 0 2px 0 var(--gold) !important; }

The Rational Myth

Seventy-six per cent compliance, and no proof it worked.

The FATF’s technical-compliance average stood at 36% in 2012; the fourth round of evaluations has it at 76%. The effectiveness average across roughly 120 assessed countries is 28%, and it is the second number here.

Two scales, one regime. Press to compare.

Same scale, two questions

0–100% · one axis

76%

Technical compliance

36% in 2012 → 76% (fourth round)

28%

Outcome effectiveness

average across ~120 countries

48-point void compliance climbed effectiveness did not follow

The navy bar took thirty years to climb. Standards in force almost everywhere money crosses a border.

Nothing pushed the grey one up. No detectable decline in the crime, on any assessment so far.

Institutional success

Did the regime take hold?

200+

jurisdictions, nearly every flag on earth, have adopted the standards.

409

members; the nine regional review bodies (FSRBs) carry the standards to everyone else.

25,000+

information exchanges run through Egmont’s secure platform in a single year.

89%

of relevant US investigations resulting in financial convictions drew on Bank Secrecy Act data.

Outcome effectiveness

The effect on the crime

97%

of assessed countries receive only low-to-moderate effectiveness ratings.

Measured decline in money laundering

No evidence that laundering has become harder or less prevalent. (Nazzari & Reuter, 2025)

The evidence on prevalence, across three decades

Why compliance held anyway

The threat of listing does the enforcing, and it works on belief: Case-Ruchala & Nance (2024) call the arrangement a ‘rational myth’.

The laundering, so far as anyone can measure, carried on.

Step 01 · the regime’s footprint

Institutional success, near-total.

Technical compliance climbed from 36% in 2012 to 76% under the fourth round of mutual evaluations. The arresting statistic: Bank Secrecy Act data figured in 89% of relevant US investigations that ended in financial convictions, with the rest of the architecture (standards across 200-plus jurisdictions, 40 members plus nine regional review bodies, 25,000-odd Egmont exchanges a year) running underneath.

Step 02 · but did it work?

Outcome effectiveness, flat.

Ask the other question and the record inverts: effectiveness scores average just 28% across about 120 assessed countries, 97% rated only low-to-moderate, and after three decades no evidence that laundering became harder or less prevalent (Nazzari & Reuter, 2025).

Step 03 · the two scales, one axis

The 48-point void.

Put both numbers on a single 0 to 100 scale: compliance reaches 76%, effectiveness 28%, and the bracket marks the 48-point gap the regime never closed.

Step 04 · why states kept complying

The myth that kept it standing.

Listing, Case-Ruchala & Nance (2024) found, does not correlate statistically with measurable financial harm to the countries listed; states comply anyway, out of fear of consequences that may never fully materialise. A ‘rational myth’, in their phrase. It works so long as they believe.

The whole record

Both at once.

A regime can be a near-universal institutional success while showing nothing against the crime it exists to stop; the FATF record holds both at once. The paper’s recommendation starts from that split.

The reframe

Read the record the other way and it is an existence proof: the machinery itself can be built, treaty or no treaty. Over 200 jurisdictions carry a common standard in their rulebooks; governments mark one another’s homework through mutual evaluation, and have done for decades. The Egmont channel carries over 25,000 exchanges a year, most for purposes its founders never anticipated. The operational backbone of a global regime can be built, and was. Three decades of looking produced no evidence that laundering receded; the wins the record does document concern state behaviour, institutions rebuilt to get off the grey list. The paper carries one recommendation out of this: score the AI Network on institutional goods, the column where the record shows results. Shared standards count. So does evaluation a second institute can actually use, and an information flow other governments will trust with restricted material; legitimacy accrues to the Network that delivers the rest.

Figures from the paper: compliance 36% (2012) to 76%, fourth round (FATF, 2022); effectiveness 28% average and 97% low-to-moderate (FATF, 2022; Basel Institute on Governance, 2024); BSA convictions (IRS, 2026). The laundering null is Nazzari & Reuter (2025), the ‘rational myth’ Case-Ruchala & Nance (2024).

04

Build it in the right order and it holds.

The FATF’s institutional goods depended on one another, and the regime nearly collapsed by reaching for enforcement first: it published a blacklist in 2000 before it had the legitimacy to make it stick, and the list was discontinued within six years. Reach for enforcement first in an AI regime and you would expect the same outcome. The machine below lets you build it either way and see.

Assemble the regime yourself. The enforcement lever is always live.

The synthesis · interactive

▲

The Sequencing Machine

Four institutional goods sit above one enforcement lever, and the lever is always live. Pull it now if you like; watch what a consequence with nothing under it does to the regime. Then build the goods that could have carried it (parallel tracks, any order) and pull it again.

The enforcement lever

Graduated escalation

The ladder runs from procurement conditionality up to compute-governance triggers, and a rung only bites while the graver one above it is believed.

Year 3–5+ Always live

The politically obvious move. The Global Call for AI Red Lines wants agreement by the end of 2026, and a list is quick to publish.

Regime credibility

Unbuilt6%

The lever is live before any of the groundwork exists. Fire it and see.

June 2000, again

You rebuilt the NCCT list.

The FATF named fifteen jurisdictions in June 2000 while exempting its own members; Switzerland and Luxembourg went unexamined. The exemptions gave the politics away, and six years finished the list.

Building came after, under pressure.

A regime that holds

Enforcement with something to stand on.

The FATF got here in 2007, after the collapse: the ICRG tied listing to quantified thresholds, members and non-members alike, and the second attempt has held since. Your lever just fired with all four goods underneath it, which is the difference.

iProcurement conditionalitypurchasing power as the first consequence

iiConditional pre-deployment accessaccess that compliance earns

iiiCompute-governance triggersthe black list behind the grey list

These recommendations follow this order; the Network today stands at the FATF’s pre-enforcement moment.

05

The foundation is a measurement problem.

The recommendations that close the paper ask for shared standards and comparable evaluation before anything else. The reason is practical: red-line definitions still disagree about basic thresholds, and a safety score produced by one institute currently tells a second institute very little. The interactive below demonstrates that second problem on real benchmark data; you set the method and watch the score move.

The Detection Problem

You are the evaluator.

An enforcement decision would be based on the evaluation outcomes; there is nothing else to go on. The outcomes depend on how the evaluation is run. Below, the same safety questions are given to three frontier models; the question format alone changes which model looks safest, and one agent harness makes two models’ scores differ by 35 points on identical items. In a published report, both decisions would live in the methods section.

Measurement bench: choose a model to highlight and flip the question format and scaffold to see how the reported score band moves.

Highlight a model Same items

01 Question format +19.6 pt

02 Scaffold 35.6 pt spread

Sycophancy-resistance, a safety proxy · higher = safer Claude Opus 4.6

Llama 415

GPT-5.257.1

Opus 4.642.1

0 25 50 75 100

The regime this brief describes would hang a verdict on one number along this axis; that number has not been agreed.

Model-by-scaffold spread 0.0

Safer → less safe GPT-5.2 · Opus 4.6 · Llama 4 Opus 4.6 reads 42.1

Bench defaults; three scores, none of them close.

Step 01 · the field is already spread

The models do not agree.

One real safety benchmark, the same 500 items put to three frontier models at the conventional default (multiple-choice, single-turn). GPT-5.2 reads 57.1 to Opus 4.6’s 42.1; Llama 4 posts a 15. No method choice has been made yet.

Step 02 · format reorders the field

Ask the same items open-ended.

The identical questions, now open-ended rather than multiple-choice. Every score rises, and Opus 4.6 rises furthest (42.1 to 75.0, a 32.9-point gain that takes it past GPT-5.2, which barely moves); multiple-choice had been deflating measured safety.

Step 03 · the scaffold is format in disguise

Run the same benchmark through a harness.

The same benchmark, now run through a map-reduce harness (a different slice of the same study). The models bifurcate: Opus loses 16.8 and Llama gains 18.8 on the same items, the study’s two largest scaffold effects, in opposite directions, a 35.6-point spread on one benchmark. Across the study, roughly 40 to 89% of the per-model map-reduce loss is the format effect from the last step again: the harness strips the answer options when it decomposes the task.

Step 04 · no composite survives

Average it, and the disagreement vanishes.

Format and scaffold have moved the scores and reordered the field; collapse it now to one safety number. The composite sits in the middle and looks precise; the spread it averages away is wide enough to flip which model is ‘safer’. On this design the generalisability coefficient is G ≈ 0.000; the scaffold architecture is the least systematic factor in the whole study, and what does the damage is the interaction of model and method, and which benchmark you happened to pick.

Step 05 · the takeaway

First, agree on the ruler.

Until the noise floor is characterised, a threshold-based verdict is false precision. That is why the paper makes measurement-science standardisation the Network’s first-order deliverable, before any institute can certify a crossing.

Measure, sources & method

The measure is sycophancy-resistance: a proxy for safety rather than a direct measure of it, but one with a documented path to consequential risk. Denison et al. (2024) build a curriculum of gameable environments beginning with sycophancy and find models trained on it generalising zero-shot to rewriting their own reward function, a held-out behaviour they were never trained on, low-rate but real (Sycophancy to Subterfuge, arXiv:2406.10162). Scores from Safety Under Scaffolding (Gringras, 2026); the zero G is a floor-truncated estimate with a wide bootstrap interval, so reliability is not provably zero, just nowhere demonstrated.

The recommendation

06

What to build now, and what to wait for.

The paper’s recommendations re-run a sequence that has already been run once. Every phase has a FATF analogue that is now history, the closest thing a plan like this can have to a track record.

Year 0–1

Build the foundation

The FATF wrote the Forty Recommendations inside its first year; the Network’s equivalent is a common glossary of red-line definitions from a standing working group the UK AISI convenes, due no later than 2027.

Administrative capacity for the Coordinator role, a network-wide confidentiality protocol and expanded pilot joint testing travel with the glossary.

Year 1–3

Make findings commensurable

The FATF’s first decade went to mutual evaluation, members scoring members, before anyone was punished for anything.

The Network’s version is harder science: three or more institutes evaluate one identical model set with the parameters deliberately varied, the first empirical measurement of evaluator-dependent noise, while information exchange picks up two logics, publication and Egmont-style originator control, and capacity-building funds reach the lower-resourced members.

Year 3–5+

Authoritative findings

The FATF’s listing machinery works as a gradient, grey list before black list, and it dates from the 2007 rebuild.

The paper copies the shape: procurement conditionality before conditional pre-deployment access, compute-governance triggers held back until the rungs beneath them are real.

Peer review between institutes, down to whether the assessors were independent of the institute under review, and a regional-body layer (the paper names ASEAN AI SAFE as an absorption candidate) are what make the gradient defensible.