Reference Architecture · Applied GenAI

Regulated GenAI Platform.

The platform you build before squads build their own bespoke version. Designed for the obligations of NIST AI RMF, EU AI Act (high-risk enforcement from 2 Aug 2026), ISO/IEC 42001 and APRA CPS 230 / CPS 234.

CONSUMER SURFACE Web · Mobile · Internal apps · Agents · MCP servers · Embedded copilots MODEL GATEWAY · ROUTING · VERSION PINNING · INPUT GUARDRAILS LiteLLM · OpenRouter · Bedrock IPR · Lakera/NeMo input policy · workload identity (OIDC) PREMIUM Opus 4.7 · GPT-5 Reasoning · long-context WORKHORSE Sonnet 4.6 · GPT-4.1 Default route CHEAP / FAST Haiku · GPT-4.1 mini Bulk · classification PRIVATE Llama / self-host Sensitive · sovereign RETRIEVAL · RAG · TOOLS Hybrid retrieval (semantic + keyword) · re-ranker · query rewriter Vector store: pgvector · Pinecone · Weaviate · Vertex Search Tool registry: MCP servers · function specs · scoped credentials PROMPT REGISTRY · EVALS · ROLLOUT Versioned prompts · eval set · regression-gated deploy Promptfoo / LangSmith / Braintrust · shadow evals in prod Canary · feature flags · auto-rollback on burn-rate OUTPUT GUARDRAILS · POLICY-TUNED CLASSIFIERS · JAILBREAK DETECTION · CONTENT FILTERS NeMo Guardrails · Guardrails AI · Azure Content Safety · Bedrock Guardrails — tested adversarially against OWASP LLM Top 10 + MITRE ATLAS SHARED SUBSTRATE OTel GenAI traces · per-decision audit evidence (signed) · cost-per-outcome · workload identity · policy-as-code Audited against: NIST AI RMF · NIST AI 600-1 · EU AI Act Art.9-15 · ISO/IEC 42001 · APRA CPS 234 ¶21-26 · OWASP LLM Top 10 · MITRE ATLAS Generated at decision time. Replayable on demand. Pre-built audit views for the three regulator questions you can predict.
Regulated GenAI Platform · reference architecture v1.0 · read the 9 controls essay

What this architecture solves.

The single most common pattern in 2026 enterprise GenAI: every squad builds its own bespoke gateway, picks its own evals tool, rolls its own audit logging, and discovers the EU AI Act obligations late. The platform below pre-builds the substrate so squads consume it safely instead of replicating it badly.

Run this against the GenAI Readiness diagnostic: an org with this architecture in place scores in the “Industrialising” band (60-80% maturity) by construction.

Layers, top to bottom.

L1 · Consumer surface

Where AI shows up to users and other systems.

Web / mobile apps, internal copilots, agent runtimes, and MCP servers. The platform doesn’t dictate the surface; it dictates that any surface that wants to call a model goes through the gateway below.

L2 · Model gateway (single ingress point)

The choke point for governance.

Every AI call goes through one logical gateway. The gateway authenticates the caller via workload identity (OIDC, not API keys), routes by use-case to the right model tier, enforces input guardrails, and emits the per-request trace before the model sees the prompt.

L3 · Model tier (multi-class)

Premium · Workhorse · Cheap-fast · Private.

Routing across four tiers, not one model. Premium for reasoning-heavy or long-context; workhorse as default; cheap-fast for high-volume classification; private (Llama / self-hosted) for sensitive or data-sovereign workloads. Cost calculator shows the leverage from routing.

L4 · Retrieval, RAG, tools

What the model sees besides the user prompt.

Hybrid retrieval (semantic + keyword + metadata filters) with re-ranker and query rewriter. Vector store choice is mostly fungible at scale. Tools / MCP servers are scoped credentials, not bearer tokens. Retrieval quality is evaluated separately from generation quality.

L5 · Prompt registry, evals, rollout

The deploy gate.

Versioned prompt registry (semver, owners). Eval set per use-case, gated in CI. Rollout via feature flags + canary; auto-rollback on burn-rate (per the Error Budget calculator pattern). Shadow evals on production traffic.

L6 · Output guardrails (last line)

Catches what gateway and model didn’t.

Output policy tuned classifiers, jailbreak-success detection, content filters, citation enforcement. Tested adversarially against OWASP LLM Top 10 and MITRE ATLAS — not just configured.

L7 · Shared substrate (the moat)

OTel GenAI · per-decision audit · cost-per-outcome · workload identity · policy-as-code.

These five primitives sit underneath every layer above. Without them, each layer rolls its own observability, identity, audit and policy — the patchwork that makes regulator-grade audit impossible.

What this architecture doesn’t solve.

  • Use-case selection. Wrong use-case will fail even with this platform. Control 1 still matters.
  • Data quality. The platform retrieves and consumes data; it doesn’t curate it. Modern Data Platform is the companion architecture.
  • Agent reliability. Multi-step agents add a class of failure modes (loop, drift, tool-misuse) that the gateway can’t catch. Plan agent observability and step-level evals separately.

Sequencing for a 90‑day build.

  1. Weeks 1-3: Model gateway + workload identity + OTel GenAI traces. One use-case flowing through.
  2. Weeks 4-6: Prompt registry + eval set + eval-gated deploy. Input + output guardrails for the first use-case.
  3. Weeks 7-9: Audit-evidence pipeline. Risk-tiered governance encoded. Second use-case onboards in days.
  4. Weeks 10-12: Routing + cost-per-outcome instrumentation. Shadow evals in prod. Third use-case onboards without changes to the platform.
Also on this site