Reference Architecture · Applied GenAI

Regulated GenAI Platform.

The platform you build before squads build their own bespoke version. Designed for the obligations of NIST AI RMF, EU AI Act (high-risk enforcement from 2 Aug 2026), ISO/IEC 42001 and APRA CPS 230 / CPS 234.

Regulated GenAI Platform · reference architecture v1.0 · read the 9 controls essay

What this architecture solves.

The single most common pattern in 2026 enterprise GenAI: every squad builds its own bespoke gateway, picks its own evals tool, rolls its own audit logging, and discovers the EU AI Act obligations late. The platform below pre-builds the substrate so squads consume it safely instead of replicating it badly.

Run this against the GenAI Readiness diagnostic: an org with this architecture in place scores in the “Industrialising” band (60-80% maturity) by construction.

Layers, top to bottom.

L1 · Consumer surface

Where AI shows up to users and other systems.

Web / mobile apps, internal copilots, agent runtimes, and MCP servers. The platform doesn’t dictate the surface; it dictates that any surface that wants to call a model goes through the gateway below.

L2 · Model gateway (single ingress point)

The choke point for governance.

Every AI call goes through one logical gateway. The gateway authenticates the caller via workload identity (OIDC, not API keys), routes by use-case to the right model tier, enforces input guardrails, and emits the per-request trace before the model sees the prompt.

Tools LiteLLM OpenRouter Bedrock IPR Lakera Guard NeMo Guardrails

L3 · Model tier (multi-class)

Premium · Workhorse · Cheap-fast · Private.

Routing across four tiers, not one model. Premium for reasoning-heavy or long-context; workhorse as default; cheap-fast for high-volume classification; private (Llama / self-hosted) for sensitive or data-sovereign workloads. Cost calculator shows the leverage from routing.

Models 2026-05 Claude (Opus/Sonnet/Haiku) OpenAI (GPT-5 / 4.1) Gemini 2.5 Llama 3.3

L4 · Retrieval, RAG, tools

What the model sees besides the user prompt.

Hybrid retrieval (semantic + keyword + metadata filters) with re-ranker and query rewriter. Vector store choice is mostly fungible at scale. Tools / MCP servers are scoped credentials, not bearer tokens. Retrieval quality is evaluated separately from generation quality.

Tools pgvector Pinecone Weaviate Vertex Search MCP servers

L5 · Prompt registry, evals, rollout

The deploy gate.

Versioned prompt registry (semver, owners). Eval set per use-case, gated in CI. Rollout via feature flags + canary; auto-rollback on burn-rate (per the Error Budget calculator pattern). Shadow evals on production traffic.

Tools Promptfoo LangSmith Braintrust Arize Phoenix

L6 · Output guardrails (last line)

Catches what gateway and model didn’t.

Output policy tuned classifiers, jailbreak-success detection, content filters, citation enforcement. Tested adversarially against OWASP LLM Top 10 and MITRE ATLAS — not just configured.

L7 · Shared substrate (the moat)

OTel GenAI · per-decision audit · cost-per-outcome · workload identity · policy-as-code.

These five primitives sit underneath every layer above. Without them, each layer rolls its own observability, identity, audit and policy — the patchwork that makes regulator-grade audit impossible.

Audited against NIST AI RMF EU AI Act ISO/IEC 42001 OWASP LLM Top 10 MITRE ATLAS

What this architecture doesn’t solve.

Use-case selection. Wrong use-case will fail even with this platform. Control 1 still matters.
Data quality. The platform retrieves and consumes data; it doesn’t curate it. Modern Data Platform is the companion architecture.
Agent reliability. Multi-step agents add a class of failure modes (loop, drift, tool-misuse) that the gateway can’t catch. Plan agent observability and step-level evals separately.

Sequencing for a 90‑day build.

Weeks 1-3: Model gateway + workload identity + OTel GenAI traces. One use-case flowing through.
Weeks 4-6: Prompt registry + eval set + eval-gated deploy. Input + output guardrails for the first use-case.
Weeks 7-9: Audit-evidence pipeline. Risk-tiered governance encoded. Second use-case onboards in days.
Weeks 10-12: Routing + cost-per-outcome instrumentation. Shadow evals in prod. Third use-case onboards without changes to the platform.