Reference Architecture · Platform Engineering

Platform Engineering IDP.

Internal Developer Platform with paved paths, golden templates, service catalogue, scorecards, self-service. Aligned to the CNCF Platform Engineering Maturity Model, Team Topologies, and DORA elite-performer criteria.

DEVELOPER PORTAL · ONE PANE OF GLASS Backstage · Cortex · OpsLevel · Port Catalogue · docs · self-service · scorecards · ownership · observability links PAVED PATH · MICROSERVICE Backstage template CI + SLSA L3 + IAM + OTel + SLOs 0 → staging in <30 min PAVED PATH · DATA JOB dbt + Airflow scaffold Lineage + contracts default Gold-zone owner = team PAVED PATH · GENAI FEATURE Gateway + evals + guardrails Prompt registry · audit pipe Inherits the 9 controls PAVED PATH · BATCH Scheduled jobs OTel · retry · DLQ SLO-bounded SELF-SERVICE CAPABILITIES DBs · queues · secrets · DNS · TLS · namespaces · IAM roles · feature flags · sandbox AI keys Provisioned by Terraform/Pulumi behind a portal action. Audited. Policy-as-code enforced. SCORECARDS · POSTURE · ADOPTION Per-service production-readiness · security posture · cost · SLO health · ownership freshness Auto-derived from real signals (no self-attestation). Visible to developer + leadership. PLATFORM SUBSTRATE (THE PRODUCT THE TEAM RUNS) Kubernetes / managed runtime · Terraform modules · OTel · OIDC workload identity · OPA · cost feed · service catalogue Team topology: stream-aligned + enabling teams · platform-as-a-product · funded for outcomes (DORA + adoption) Versioned platform with auto-migration tooling for breaking changes — consuming teams don’t do the migration work Audited against: CNCF Platform Eng Maturity Model · Team Topologies · DORA Accelerate · SPACE / DX
Platform Engineering IDP · reference architecture v1.0 · read the AI moat essay

What this architecture solves.

The shape that turns a tool stack into a platform. Paved paths absorb the substrate (CI, security, identity, observability, cost) so a developer’s job is to write business logic, not to integrate ten tools. Scorecards make posture visible without self-attestation. The substrate runs as a versioned product with consumers, owners and SLOs — not as a tool stack with a portal on top.

The four paved paths.

P1 · Microservice paved path

Default for new services.

Backstage Software Template scaffolds: repo with CI configured for SLSA L3, OIDC for cloud access, OpenTelemetry instrumentation, SLO definitions, default dashboards, on-call rotation linked. Zero to staging in <30 minutes.

P2 · Data job paved path

For dbt models, Airflow DAGs, Spark jobs.

Scaffold includes lineage emission (OpenLineage), data-quality tests (Soda / Great Expectations), contract definitions, Gold-zone landing convention. The team that owns the model owns the data product.

P3 · GenAI feature paved path

The Regulated GenAI platform as a template.

Scaffold integrates with the gateway, prompt registry, eval-gated deploy and audit-evidence pipeline of the Regulated GenAI Platform. New GenAI features inherit the nine controls by construction.

P4 · Batch job paved path

Scheduled, idempotent, observable.

Cron + retry + DLQ + OTel + SLO-bounded. The path most platform teams ignore because it’s unsexy — and most legacy ops debt accumulates against.

Self-service capabilities.

Every common provisioning need behind a portal action: DB instance, queue, secret, DNS record, TLS cert, namespace, IAM role, feature flag, sandbox AI key. Each provisioned by Terraform/Pulumi modules with policy-as-code embedded. Audit trail per action; no tickets.

Scorecards — auto-derived, not self-attested.

Per service: production-readiness (CI green? SBOM emitted? SLO defined? on-call wired?), security posture (signed image? KEV CVE count? MFA?), cost (per-month vs budget? trending), SLO health, ownership freshness (owner active in last 30 days?).

Scorecards must be hard to game. Self-attested fields decay; auto-derived fields don’t.

How the platform is funded and run.

  • Funded as a product with outcome metrics: lead time, change-fail rate, time-to-first-deploy, paved-path adoption, developer NPS. Not as a cost-centre.
  • Stream-aligned + enabling teams per Team Topologies. Central platform team is small; enabling teams move into streams temporarily to help them adopt the platform.
  • Versioned with auto-migration tooling. Breaking changes ship with codemods (jscodeshift, OpenRewrite, ast-grep) and a migration job. Consuming teams don’t do the migration work.
  • Sunset rituals quarterly. Subtract capabilities that aren’t adopted. Without sunset, every platform eventually becomes a tool stack again.

Sequencing for the first 90 days.

  1. Weeks 1-4: Backstage stood up with service catalogue. One paved path (microservice) shipped with one team.
  2. Weeks 5-8: Self-service for the top three needs (DB, queue, secrets). Scorecards for production-readiness.
  3. Weeks 9-12: Second and third paved paths added. Auto-migration tooling for the first breaking change. Developer NPS baseline.
Also on this site