The challenge.
A Tier-1 ANZ bank with no enterprise-grade data archival solution. Every application team had a different approach — duplicated tooling, duplicated process, duplicated cost. Legacy systems could not be decommissioned because retention obligations had no centralised home. No standardised onboarding model; no consistent operational procedures.
The estate was multi-petabyte. The application count was 1,000+. The regulatory floor was strict. The opportunity — if a single platform could carry the load — was an order of magnitude beyond what any local solution could deliver.
The constraints.
- Regulatory: Mandatory data retention; strict privacy + security mandates; audit + governance controls required at every layer.
- Technical: Must ingest from on-premises systems and multiple cloud regions; support structured, semi-structured and unstructured data; cloud-agnostic and customisable.
- Organisational: No unified approach across 1,000+ applications. Self-service, role-based retrieval was a non-negotiable. Enterprise foundation requirements (CI/CD, DR, AI enablement) had to be present from day one.
The approach.
Discovery phase — three months.
Deep assessment across all 1,000+ applications. Evaluation of cloud-native services vs custom platform via decisioning trees built on data patterns, compliance requirements and cost. The honest output of discovery: a custom-built solution on AWS, not an off-the-shelf platform.
Platform design & build.
Phased rollout:
- Phase 1 — structured data workloads.
- Phase 2 — semi-structured and unstructured data.
Automated lifecycle management (retention, expiration, archival, purge). Self-service retrieval via Active Directory groups. The retrieval experience was the platform’s adoption mechanism — if it didn’t work for application teams, the platform wouldn’t carry the load.
Enterprise enablement.
Complete documentation suite (design, SOPs, operations manuals). Security and compliance frameworks. Peer-review and QA processes. Built to be operated by the bank, not by the implementation team.
- Serverless processing AWS Lambda
- Data movement AWS DataSync
- Storage Amazon S3
- Metadata & indexing MongoDB Atlas
- Orchestration AWS Step Functions
- Data lineage Alation
- Infrastructure as code Terraform
- CI/CD Custom-built pipelines
- Classification AI-driven metadata analysis
Outcomes.
achieved
to one platform
unstructured supported
systems retired
Financial: >$50M USD in annual cost savings; reduced infrastructure overhead across the estate.
Operational: Legacy applications became decommissionable; operational complexity dropped meaningfully.
Strategic: Enterprise-grade, cloud-native platform under full bank ownership with minimal ongoing maintenance. AI-enabled foundation for continuous archival. Accelerated the wider legacy-modernisation programme. Regulatory and security alignment achieved at platform level, not at application level.
Data archival stops being a constraint and becomes a strategic enabler the moment one platform carries the regulatory floor for the entire estate.
What I would do again on a similar engagement.
- Discovery before design. Three months of deep assessment looked expensive on paper and saved years in build.
- Phased on data type, not application. Phasing by data complexity (structured → unstructured) sequenced the engineering risk; phasing by application would have spread it.
- Self-service retrieval as the adoption mechanism. The platform’s success was determined by how easy retrieval was — not by how good the archival logic was.
- Custom on AWS, not enterprise platform. For this scale and regulatory profile, the trade-off favoured ownership + customisability over vendor managed.
- Document for the next team. The bank operates the platform now. That was the success criterion.