Platform Engineering IDP.
Internal Developer Platform with paved paths, golden templates, service catalogue, scorecards, self-service. Aligned to the CNCF Platform Engineering Maturity Model, Team Topologies, and DORA elite-performer criteria.
What this architecture solves.
The shape that turns a tool stack into a platform. Paved paths absorb the substrate (CI, security, identity, observability, cost) so a developer’s job is to write business logic, not to integrate ten tools. Scorecards make posture visible without self-attestation. The substrate runs as a versioned product with consumers, owners and SLOs — not as a tool stack with a portal on top.
The four paved paths.
Default for new services.
Backstage Software Template scaffolds: repo with CI configured for SLSA L3, OIDC for cloud access, OpenTelemetry instrumentation, SLO definitions, default dashboards, on-call rotation linked. Zero to staging in <30 minutes.
For dbt models, Airflow DAGs, Spark jobs.
Scaffold includes lineage emission (OpenLineage), data-quality tests (Soda / Great Expectations), contract definitions, Gold-zone landing convention. The team that owns the model owns the data product.
The Regulated GenAI platform as a template.
Scaffold integrates with the gateway, prompt registry, eval-gated deploy and audit-evidence pipeline of the Regulated GenAI Platform. New GenAI features inherit the nine controls by construction.
Scheduled, idempotent, observable.
Cron + retry + DLQ + OTel + SLO-bounded. The path most platform teams ignore because it’s unsexy — and most legacy ops debt accumulates against.
Self-service capabilities.
Every common provisioning need behind a portal action: DB instance, queue, secret, DNS record, TLS cert, namespace, IAM role, feature flag, sandbox AI key. Each provisioned by Terraform/Pulumi modules with policy-as-code embedded. Audit trail per action; no tickets.
Scorecards — auto-derived, not self-attested.
Per service: production-readiness (CI green? SBOM emitted? SLO defined? on-call wired?), security posture (signed image? KEV CVE count? MFA?), cost (per-month vs budget? trending), SLO health, ownership freshness (owner active in last 30 days?).
Scorecards must be hard to game. Self-attested fields decay; auto-derived fields don’t.
How the platform is funded and run.
- Funded as a product with outcome metrics: lead time, change-fail rate, time-to-first-deploy, paved-path adoption, developer NPS. Not as a cost-centre.
- Stream-aligned + enabling teams per Team Topologies. Central platform team is small; enabling teams move into streams temporarily to help them adopt the platform.
- Versioned with auto-migration tooling. Breaking changes ship with codemods (jscodeshift, OpenRewrite, ast-grep) and a migration job. Consuming teams don’t do the migration work.
- Sunset rituals quarterly. Subtract capabilities that aren’t adopted. Without sunset, every platform eventually becomes a tool stack again.
Sequencing for the first 90 days.
- Weeks 1-4: Backstage stood up with service catalogue. One paved path (microservice) shipped with one team.
- Weeks 5-8: Self-service for the top three needs (DB, queue, secrets). Scorecards for production-readiness.
- Weeks 9-12: Second and third paved paths added. Auto-migration tooling for the first breaking change. Developer NPS baseline.