Regulated GenAI Platform.
The platform you build before squads build their own bespoke version. Designed for the obligations of NIST AI RMF, EU AI Act (high-risk enforcement from 2 Aug 2026), ISO/IEC 42001 and APRA CPS 230 / CPS 234.
What this architecture solves.
The single most common pattern in 2026 enterprise GenAI: every squad builds its own bespoke gateway, picks its own evals tool, rolls its own audit logging, and discovers the EU AI Act obligations late. The platform below pre-builds the substrate so squads consume it safely instead of replicating it badly.
Run this against the GenAI Readiness diagnostic: an org with this architecture in place scores in the “Industrialising” band (60-80% maturity) by construction.
Layers, top to bottom.
Where AI shows up to users and other systems.
Web / mobile apps, internal copilots, agent runtimes, and MCP servers. The platform doesn’t dictate the surface; it dictates that any surface that wants to call a model goes through the gateway below.
The choke point for governance.
Every AI call goes through one logical gateway. The gateway authenticates the caller via workload identity (OIDC, not API keys), routes by use-case to the right model tier, enforces input guardrails, and emits the per-request trace before the model sees the prompt.
Premium · Workhorse · Cheap-fast · Private.
Routing across four tiers, not one model. Premium for reasoning-heavy or long-context; workhorse as default; cheap-fast for high-volume classification; private (Llama / self-hosted) for sensitive or data-sovereign workloads. Cost calculator shows the leverage from routing.
What the model sees besides the user prompt.
Hybrid retrieval (semantic + keyword + metadata filters) with re-ranker and query rewriter. Vector store choice is mostly fungible at scale. Tools / MCP servers are scoped credentials, not bearer tokens. Retrieval quality is evaluated separately from generation quality.
The deploy gate.
Versioned prompt registry (semver, owners). Eval set per use-case, gated in CI. Rollout via feature flags + canary; auto-rollback on burn-rate (per the Error Budget calculator pattern). Shadow evals on production traffic.
Catches what gateway and model didn’t.
Output policy tuned classifiers, jailbreak-success detection, content filters, citation enforcement. Tested adversarially against OWASP LLM Top 10 and MITRE ATLAS — not just configured.
OTel GenAI · per-decision audit · cost-per-outcome · workload identity · policy-as-code.
These five primitives sit underneath every layer above. Without them, each layer rolls its own observability, identity, audit and policy — the patchwork that makes regulator-grade audit impossible.
What this architecture doesn’t solve.
- Use-case selection. Wrong use-case will fail even with this platform. Control 1 still matters.
- Data quality. The platform retrieves and consumes data; it doesn’t curate it. Modern Data Platform is the companion architecture.
- Agent reliability. Multi-step agents add a class of failure modes (loop, drift, tool-misuse) that the gateway can’t catch. Plan agent observability and step-level evals separately.
Sequencing for a 90‑day build.
- Weeks 1-3: Model gateway + workload identity + OTel GenAI traces. One use-case flowing through.
- Weeks 4-6: Prompt registry + eval set + eval-gated deploy. Input + output guardrails for the first use-case.
- Weeks 7-9: Audit-evidence pipeline. Risk-tiered governance encoded. Second use-case onboards in days.
- Weeks 10-12: Routing + cost-per-outcome instrumentation. Shadow evals in prod. Third use-case onboards without changes to the platform.