What to reach for — when.
Not a recommended-books page. Each entry is keyed to a moment in the work: when you’re standing up a platform team, defending an architecture choice, sizing GenAI risk, rebuilding an SRE programme, or briefing a board. The ones I’ve marked re-read earn that for the second-time pass; the rest earn the first.
GenAI, defensibly.
The corpus that separates “we shipped a chatbot” from “we shipped a governable system.” Read in this order; the regulatory texts last, so they read as recognisable not abstract.
Building LLM-Powered Applications
The shape of the production stack — gateway, retrieval, evals, guardrails, observability — without vendor hype. Read first: it’s the cleanest map of what you’re actually building.
NIST AI Risk Management Framework + GenAI Profile
The Govern / Map / Measure / Manage scaffold is the most-cited GenAI risk vocabulary in the world. The GenAI Profile is the operational annex. If you read one government doc this year, read this.
ISO/IEC 42001 — AI Management Systems
Becoming a procurement floor for enterprise AI vendors in 2026. Read with the same eyes as ISO 27001 if you remember that landing — the org changes follow a familiar shape.
EU AI Act
Read Articles 6–15 (high-risk classification, requirements) and Annex III (high-risk use-cases). The rest is summary. The 2026-08 enforcement date is the calendar you build against.
OWASP Top 10 for LLM Applications
Read every entry. Then map your application to which of the 10 you defend against, which you tolerate, which you don’t cover at all. That’s your threat model.
MITRE ATLAS
The adversarial counterpart to OWASP. Read when designing red-team exercises for AI features — you need the attacker vocabulary.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
The original RAG paper. Worth the 30 minutes for the framing alone — most production RAG architectures are decorations on this.
The nine controls that make GenAI defensible
My own — offered as the operational distillation of the texts above, mapped to NIST AI RMF and ISO 42001 functions.
Platform engineering, as a product.
The five-text starter pack. Read Team Topologies before you publish the team charter, not after.
Team Topologies
Stream-aligned · enabling · platform · complicated-subsystem teams — the vocabulary is now industry-wide. Re-read the chapters on cognitive load and team APIs every 6 months.
Accelerate
The four key metrics + the capability model are the single best operational scoreboard for engineering organisations. Annual DORA reports build on this baseline.
CNCF Platform Engineering Maturity Model
The 5-level maturity model (Provisional · Operational · Scalable · Optimizing) is the most useful self-assessment yardstick available. Pair with my Platform Eng Emerging tier page.
DX Developer Experience Research
The strongest empirical link from developer-experience signal to delivery performance. Read the SPACE framework primer first.
PlatformCon talks (Humanitec)
Filter for case-study talks from regulated orgs (Mercedes, BMW, Capital One). The pattern repeats: paved paths, golden defaults, adoption-as-the-KPI.
Architecture as decision capture.
The trade-off is the artefact. These read fast and change how you write the decision down.
Architecture Decision Records (ADR) specification
The smallest practical artefact in architecture. Context · Decision · Status · Consequences. Adopt the template before the second major decision; the discipline compounds.
Fundamentals of Software Architecture
Best primer on architectural characteristics (the “-ilities”) and the trade-off framing. Useful as the shared vocabulary across an architecture function.
Software Architecture: The Hard Parts
Distributed-systems trade-offs at the granularity teams actually face. Read the chapters on data-ownership and granularity choices first.
TOGAF 10 (Standard, not the courseware)
Skim — don’t memorise. The capability/architecture-domain mental model is the part most enterprise architects use; the rest is reference.
The encoded enterprise architect
My own — the case for moving from PDF principles to policy-as-code, with the substrate-shift argument that drives the 4-Discipline Stack.
SRE, as discipline.
The Google books are free online and still the canonical text. Read the Workbook first — it’s the practical companion.
Site Reliability Workbook
Practical companion to the SRE book. Error-budget policy chapter (Ch. 8) is the single most-cited operational template in industry.
Site Reliability Engineering
The original. Read Ch. 4 (SLOs), Ch. 5 (eliminating toil), Ch. 15 (postmortem culture) first. The rest is reference depth.
Implementing Service Level Objectives
The how-to for SLOs that actually drive behaviour. Read alongside the Error Budget calculator.
PagerDuty State of Digital Operations
The data on alert volume, actionable-alert rate, on-call burnout. Use to defend the headcount and tooling investments for the next rotation.
FinOps, not finance-ops.
Read the FinOps Framework before you accept the next vendor pitch. The Framework is the lingua franca; the rest is plumbing.
FinOps Foundation Framework
Phases (Inform/Optimise/Operate), domains, capabilities, principles. Adopt the vocabulary; it cuts a quarter off any vendor or consulting engagement.
State of FinOps Report
Where your org sits on the maturity curve, against industry. Use to defend FinOps investment.
Cloud FinOps (2nd ed.)
The how-to for the Framework. Read the chapters on showback/chargeback and unit economics; skip the tool-specific chapters.
Cloud Commitment Optimiser
For the second move out of Aware tier — tune coverage and 1y/3y mix against your steady-state load.
Supply chain, before the next CVE.
The standards have caught up; the regulators are next. Read these in order; the regulation is recognisable once you know the technical pattern.
SLSA v1.0
The 4-level provenance / build-integrity model. L2 is the practical bar for regulated workloads; L3+ is the aspirational. Read the spec, not the marketing.
NIST SSDF (SP 800-218)
The basis for the US CISA Secure Software Attestation. If you sell to US federal, you complete this form — this is the technical content behind it.
CISA KEV Catalog
The actually-exploited subset of CVEs — the right priority list for patching. Wire as an alert source for your SBOM owner-loop.
Software Supply Chain Security
Vendor-neutral coverage of SBOMs, signing, attestation, vulnerability management. Read the chapters on the operational loop, not the regulation summary.
DevSecOps SLSA L3+ Paved Path reference architecture
My own — an opinionated paved-path that implements SSDF/SLSA L2–L3 with concrete control points and tooling defaults.
The board brief.
The audience is not technical. The texts that earn you airtime read like they were written for them, because they were.
APRA CPS 230 + CPS 234
If you operate in AU financial services: 230 (operational risk) and 234 (information security) are now the central conversation. Read the prudential standard, not the consulting summary.
EU DORA
If you operate in EU financial services or are a critical ICT provider to one: the resilience-testing and concentration-risk obligations are now live. Read the RTS (regulatory technical standards) too.
State of Enterprise Tech in Regulated Industries 2026
My own — the 12-page briefing I’d give a board if asked to summarise where the regulated-industries enterprise stack stands going into 2026.
The ongoing signal sources.
A small, ruthless list. Long enough to keep current; short enough to actually read.
The Pragmatic Engineer
The reference newsletter for the senior-engineering perspective on industry events. Deep investigations on incidents and org changes.
Platform Engineering newsletter (PlatformCon)
Case studies + reference patterns. Filter for regulated-industry contributions.
The Changelog
Open-source ecosystem signal. Cherry-pick interviews on tools relevant to your stack.
All Things Distributed (Werner Vogels)
Long-arc thinking on distributed systems and cloud economics from a position with high signal. Not weekly; read when posted.
QCon & USENIX SREcon
For architecture: QCon case-study tracks. For SRE: SREcon talks. Skip the keynotes; the case-study tracks are the value.
KubeCon + CloudNativeCon
The ecosystem barometer for platform / cloud-native / supply-chain projects. Read the end-user case-study track first.
Letters — monthly
My own — one synthesis-letter a month from the work, the readings, and the field. Free, ad-free, no growth-hacks.
Updated on the working list. If you’d add or argue with a recommendation, write back — the next revision benefits.
contact@hellouchit.com →