Modern Data Platform.
Lakehouse + governed mesh: ingestion, contracts, lineage, catalogue, semantic layer, feature store, audit retention. Aligned to DCAM, DAMA-DMBOK, ISO 8000, and the data-residency requirements of EU GDPR, APRA CPS 234 and HIPAA.
What this architecture solves.
The pattern that lets BI, ML and GenAI consume the same governed substrate without bespoke pipelines per consumer. Bronze/Silver/Gold lakehouse for storage discipline; domain-owned data products for accountability; semantic layer for one definition of business metrics; feature store + vector for parity between ML and AI workloads.
Why this shape.
The two design tensions that drive this architecture:
- Lakehouse vs warehouse. Lakehouse (Delta, Iceberg, Hudi) gives storage flexibility, format portability and ML-readiness. The traditional warehouse still wins on sub-second BI; semantic layer + materialised marts close the gap.
- Central platform vs domain mesh. Pure mesh struggles with cross-domain consistency; pure central platform becomes a bottleneck. The hybrid: central platform owns the substrate (catalogue, lineage, access, contracts); domain teams own their data products inside it.
Layers, top to bottom.
OLTP, SaaS, events, IoT, third-party.
Every source has a documented owner and a contract for what changes when. CDC (Debezium, Fivetran) is the dominant ingestion path for OLTP; event streams (Kafka) for system-of-record events.
The storage discipline.
Bronze is append-only raw (audit retention satisfies most regulators by itself). Silver is conformed and validated. Gold is consumable — analytics-ready, owned by the domain. Delta Lake, Iceberg and Hudi are the three open table formats; format choice matters less than ownership clarity.
Customer, payment, risk, clinical — each a product.
Following the data-mesh pattern. Each domain owns its products with documented SLAs (freshness, completeness), versioned contracts, and on-call ownership. The central platform provides the substrate; domains own the products.
One definition of business metrics.
The single most under-invested layer in 2026 enterprises. Without it, three teams report three different revenue numbers from the same warehouse. Cube, dbt Semantic Layer, LookML are options; pick one and stop letting BI tools define metrics.
The substrate ML and GenAI consume.
Feature store gives ML training/serving parity (offline/online). Vector store gives RAG its retrievable corpus. Both should consume from the Gold zone, not bespoke pipelines — this is what enables the Regulated GenAI Platform.
Catalogue · lineage · access · PII · retention · quality.
Catalogue (DataHub, OpenMetadata, Unity, Atlas) + lineage (OpenLineage) + access control + PII classification + retention/erasure + data-quality monitoring (Soda, Great Expectations). The pieces that make the rest auditable.