Enterprise data archival platform — $50M+ annual savings — Case Study

The challenge.

A Tier-1 ANZ bank with no enterprise-grade data archival solution. Every application team had a different approach — duplicated tooling, duplicated process, duplicated cost. Legacy systems could not be decommissioned because retention obligations had no centralised home. No standardised onboarding model; no consistent operational procedures.

The estate was multi-petabyte. The application count was 1,000+. The regulatory floor was strict. The opportunity — if a single platform could carry the load — was an order of magnitude beyond what any local solution could deliver.

The constraints.

Regulatory: Mandatory data retention; strict privacy + security mandates; audit + governance controls required at every layer.
Technical: Must ingest from on-premises systems and multiple cloud regions; support structured, semi-structured and unstructured data; cloud-agnostic and customisable.
Organisational: No unified approach across 1,000+ applications. Self-service, role-based retrieval was a non-negotiable. Enterprise foundation requirements (CI/CD, DR, AI enablement) had to be present from day one.

The approach.

Discovery phase — three months.

Deep assessment across all 1,000+ applications. Evaluation of cloud-native services vs custom platform via decisioning trees built on data patterns, compliance requirements and cost. The honest output of discovery: a custom-built solution on AWS, not an off-the-shelf platform.

Platform design & build.

Phased rollout:

Phase 1 — structured data workloads.
Phase 2 — semi-structured and unstructured data.

Automated lifecycle management (retention, expiration, archival, purge). Self-service retrieval via Active Directory groups. The retrieval experience was the platform’s adoption mechanism — if it didn’t work for application teams, the platform wouldn’t carry the load.

Enterprise enablement.

Complete documentation suite (design, SOPs, operations manuals). Security and compliance frameworks. Peer-review and QA processes. Built to be operated by the bank, not by the implementation team.

Technology stack

Serverless processing AWS Lambda
Data movement AWS DataSync
Storage Amazon S3
Metadata & indexing MongoDB Atlas
Orchestration AWS Step Functions
Data lineage Alation
Infrastructure as code Terraform
CI/CD Custom-built pipelines
Classification AI-driven metadata analysis

Outcomes.

$50M+

Annual cost savings
achieved

1,000+

Applications onboarded
to one platform

Multi‑PBestate

Structured + semi +
unstructured supported

Legacydecom'd

Retention satisfied;
systems retired

Financial: >$50M USD in annual cost savings; reduced infrastructure overhead across the estate.
Operational: Legacy applications became decommissionable; operational complexity dropped meaningfully.
Strategic: Enterprise-grade, cloud-native platform under full bank ownership with minimal ongoing maintenance. AI-enabled foundation for continuous archival. Accelerated the wider legacy-modernisation programme. Regulatory and security alignment achieved at platform level, not at application level.

Closing insight

Data archival stops being a constraint and becomes a strategic enabler the moment one platform carries the regulatory floor for the entire estate.

What I would do again on a similar engagement.

Discovery before design. Three months of deep assessment looked expensive on paper and saved years in build.
Phased on data type, not application. Phasing by data complexity (structured → unstructured) sequenced the engineering risk; phasing by application would have spread it.
Self-service retrieval as the adoption mechanism. The platform’s success was determined by how easy retrieval was — not by how good the archival logic was.
Custom on AWS, not enterprise platform. For this scale and regulatory profile, the trade-off favoured ownership + customisability over vendor managed.
Document for the next team. The bank operates the platform now. That was the success criterion.

Originally published on Medium · read the original

Enterprise data archival — $50M+ annual savings.