GenAI Readiness · Tier 2 of 5

GenAI — Piloting.

Something is in users’ hands, but not defensibly. Evals don’t gate deploys, guardrails aren’t tested adversarially, audit evidence is best-effort, cost-per-outcome is unknown. EU AI Act enforcement (2 Aug 2026) finds you out.

~25% of enterprises cluster here per Menlo Ventures State of GenAI 2024. One or two use-cases in limited production; governance ad-hoc. Approaching threshold to need ISO/IEC 42001 readiness if selling B2B in regulated sectors.

What this tier actually looks like.

You have something in users’ hands. It works most of the time. The team is proud of it — rightfully. Then a senior engineer asks: “what would happen to a deploy whose prompt change drops the critical eval score by 30%?” The answer is silence, because the eval set lives in a repo and nobody runs it on prompt changes.

You probably have:

You probably don’t have:

Why most teams get stuck here.

Piloting orgs typically stall because the next move feels expensive and unfamiliar. Three flawed instincts that keep teams here:

The 12% of enterprises that crossed from Piloting to Operating (and beyond) didn’t do it by adding more features. They did it by building the substrate — gateway, prompt registry, evals, guardrails, audit — before the second use-case.

The three substrate moves to the next tier.

1. Prompt registry + eval-gated deploys. End of the inline-prompt era.

Move every production prompt out of code and into a versioned registry (LangSmith, Promptfoo, Langfuse). Build an eval set of 50 cases with known-good answers. Run it in CI; block merge on critical regression.

Closes the Inline Prompt and Eval Set That Never Runs gaps simultaneously.

2. Layered guardrails, tested adversarially.

Input + output guardrails for the two failure modes you most fear (typically OWASP LLM01 prompt injection + LLM06 sensitive disclosure). Test them with an adversarial prompt suite, not just hope. NVIDIA NeMo Guardrails, Guardrails AI, Azure AI Content Safety, AWS Bedrock Guardrails are all viable.

3. Per-decision audit pipeline + cost-per-outcome.

Per request: prompt, retrieved context, model+version, output, applied guardrails. Logged, signed, retention-policy-controlled. Cost attribution: cost-per-resolved-task, not cost-per-token.

This is the substrate the 9 controls essay describes (read it) and the reference architecture provides (Regulated GenAI Platform).

What changes when you cross.

Once these three moves land:

This is the Operating tier. ~12% of enterprises are here. The gap from Operating to Industrialising is then smaller than the gap from Piloting to Operating — the substrate compounds.

Run the diagnostic.

To find out whether your team scores at this tier or another, run GenAI Readiness. It takes 2–4 minutes and surfaces both your overall tier and the capability breakdown that shows you where the move starts.

For the bigger picture: the compound diagnostic takes results from all six diagnostics and shows you the substrate gap that bounds your overall delivery, not the per-discipline symptom.