Practitioner reading list

What to reach for — when.

Not a recommended-books page. Each entry is keyed to a moment in the work: when you’re standing up a platform team, defending an architecture choice, sizing GenAI risk, rebuilding an SRE programme, or briefing a board. The ones I’ve marked re-read earn that for the second-time pass; the rest earn the first.

When — you’re standing up GenAI in a regulated context

GenAI, defensibly.

The corpus that separates “we shipped a chatbot” from “we shipped a governable system.” Read in this order; the regulatory texts last, so they read as recognisable not abstract.

Bookre-read

Building LLM-Powered Applications

Valentina Alto · Packt · 2024

The shape of the production stack — gateway, retrieval, evals, guardrails, observability — without vendor hype. Read first: it’s the cleanest map of what you’re actually building.

Standard

NIST AI Risk Management Framework + GenAI Profile

NIST AI RMF 1.0 · NIST-AI-600-1 · 2024

The Govern / Map / Measure / Manage scaffold is the most-cited GenAI risk vocabulary in the world. The GenAI Profile is the operational annex. If you read one government doc this year, read this.

Standard

ISO/IEC 42001 — AI Management Systems

ISO/IEC · 2023

Becoming a procurement floor for enterprise AI vendors in 2026. Read with the same eyes as ISO 27001 if you remember that landing — the org changes follow a familiar shape.

Regulation

EU AI Act

Regulation (EU) 2024/1689 · high-risk obligations from 2 Aug 2026

Read Articles 6–15 (high-risk classification, requirements) and Annex III (high-risk use-cases). The rest is summary. The 2026-08 enforcement date is the calendar you build against.

Standard

OWASP Top 10 for LLM Applications

OWASP · v1.1 (current)

Read every entry. Then map your application to which of the 10 you defend against, which you tolerate, which you don’t cover at all. That’s your threat model.

Standard

MITRE ATLAS

MITRE Adversarial Threat Landscape for AI

The adversarial counterpart to OWASP. Read when designing red-team exercises for AI features — you need the attacker vocabulary.

Paper

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al. · Meta AI · 2020

The original RAG paper. Worth the 30 minutes for the framing alone — most production RAG architectures are decorations on this.

Essay

The nine controls that make GenAI defensible

Uchit Vyas · 2026

My own — offered as the operational distillation of the texts above, mapped to NIST AI RMF and ISO 42001 functions.

When — you’re standing up (or rescuing) a platform team

Platform engineering, as a product.

The five-text starter pack. Read Team Topologies before you publish the team charter, not after.

Bookre-read

Team Topologies

Matthew Skelton & Manuel Pais · IT Revolution · 2019

Stream-aligned · enabling · platform · complicated-subsystem teams — the vocabulary is now industry-wide. Re-read the chapters on cognitive load and team APIs every 6 months.

Bookre-read

Accelerate

Forsgren, Humble, Kim · IT Revolution · 2018 (DORA research basis)

The four key metrics + the capability model are the single best operational scoreboard for engineering organisations. Annual DORA reports build on this baseline.

Reference

CNCF Platform Engineering Maturity Model

CNCF Platforms WG · latest revision

The 5-level maturity model (Provisional · Operational · Scalable · Optimizing) is the most useful self-assessment yardstick available. Pair with my Platform Eng Emerging tier page.

Research

DX Developer Experience Research

DX (developer-experience.com) · ongoing

The strongest empirical link from developer-experience signal to delivery performance. Read the SPACE framework primer first.

Talk

PlatformCon talks (Humanitec)

Annual conference · talks on YouTube free

Filter for case-study talks from regulated orgs (Mercedes, BMW, Capital One). The pattern repeats: paved paths, golden defaults, adoption-as-the-KPI.

When — you’re defending an architecture choice

Architecture as decision capture.

The trade-off is the artefact. These read fast and change how you write the decision down.

Reference

Architecture Decision Records (ADR) specification

Michael Nygard · original post 2011 · living spec

The smallest practical artefact in architecture. Context · Decision · Status · Consequences. Adopt the template before the second major decision; the discipline compounds.

Book

Fundamentals of Software Architecture

Mark Richards & Neal Ford · O’Reilly · 2020

Best primer on architectural characteristics (the “-ilities”) and the trade-off framing. Useful as the shared vocabulary across an architecture function.

Bookre-read

Software Architecture: The Hard Parts

Ford, Richards, Sadalage, Dehghani · O’Reilly · 2021

Distributed-systems trade-offs at the granularity teams actually face. Read the chapters on data-ownership and granularity choices first.

Framework

TOGAF 10 (Standard, not the courseware)

The Open Group · 10th ed.

Skim — don’t memorise. The capability/architecture-domain mental model is the part most enterprise architects use; the rest is reference.

Framework

BIAN & BIZBOK

Industry capability/business-architecture references

BIAN for banking architecture vocabulary; BIZBOK for general business-capability modelling. Use as a reference, not a religion.

Essay

The encoded enterprise architect

Uchit Vyas · 2026

My own — the case for moving from PDF principles to policy-as-code, with the substrate-shift argument that drives the 4-Discipline Stack.

When — you’re rebuilding an SRE programme

SRE, as discipline.

The Google books are free online and still the canonical text. Read the Workbook first — it’s the practical companion.

Bookre-read

Site Reliability Workbook

Beyer, Murphy, Rensin, Kawahara, Thorne · Google/O’Reilly · free online

Practical companion to the SRE book. Error-budget policy chapter (Ch. 8) is the single most-cited operational template in industry.

Book

Site Reliability Engineering

Beyer, Jones, Petoff, Murphy · Google/O’Reilly · free online

The original. Read Ch. 4 (SLOs), Ch. 5 (eliminating toil), Ch. 15 (postmortem culture) first. The rest is reference depth.

Book

Implementing Service Level Objectives

Alex Hidalgo · O’Reilly · 2020

The how-to for SLOs that actually drive behaviour. Read alongside the Error Budget calculator.

Research

PagerDuty State of Digital Operations

PagerDuty · annual

The data on alert volume, actionable-alert rate, on-call burnout. Use to defend the headcount and tooling investments for the next rotation.

When — you’re rationalising cloud spend

FinOps, not finance-ops.

Read the FinOps Framework before you accept the next vendor pitch. The Framework is the lingua franca; the rest is plumbing.

Reference

FinOps Foundation Framework

FinOps Foundation · living

Phases (Inform/Optimise/Operate), domains, capabilities, principles. Adopt the vocabulary; it cuts a quarter off any vendor or consulting engagement.

Research

State of FinOps Report

FinOps Foundation · annual

Where your org sits on the maturity curve, against industry. Use to defend FinOps investment.

Book

Cloud FinOps (2nd ed.)

J.R. Storment & Mike Fuller · O’Reilly · 2023

The how-to for the Framework. Read the chapters on showback/chargeback and unit economics; skip the tool-specific chapters.

Tool

Cloud Commitment Optimiser

Uchit Vyas · live calculator

For the second move out of Aware tier — tune coverage and 1y/3y mix against your steady-state load.

When — you’re hardening the software supply chain

Supply chain, before the next CVE.

The standards have caught up; the regulators are next. Read these in order; the regulation is recognisable once you know the technical pattern.

Standard

SLSA v1.0

OpenSSF · Supply-chain Levels for Software Artefacts

The 4-level provenance / build-integrity model. L2 is the practical bar for regulated workloads; L3+ is the aspirational. Read the spec, not the marketing.

Standard

NIST SSDF (SP 800-218)

NIST · Secure Software Development Framework

The basis for the US CISA Secure Software Attestation. If you sell to US federal, you complete this form — this is the technical content behind it.

Reference

CISA KEV Catalog

CISA · updated continuously

The actually-exploited subset of CVEs — the right priority list for patching. Wire as an alert source for your SBOM owner-loop.

Book

Software Supply Chain Security

Cassie Crossley · O’Reilly · 2024

Vendor-neutral coverage of SBOMs, signing, attestation, vulnerability management. Read the chapters on the operational loop, not the regulation summary.

Reference

DevSecOps SLSA L3+ Paved Path reference architecture

Uchit Vyas · reference architecture

My own — an opinionated paved-path that implements SSDF/SLSA L2–L3 with concrete control points and tooling defaults.

When — you’re briefing a board on technology risk

The board brief.

The audience is not technical. The texts that earn you airtime read like they were written for them, because they were.

Regulation

APRA CPS 230 + CPS 234

Australian Prudential Regulation Authority

If you operate in AU financial services: 230 (operational risk) and 234 (information security) are now the central conversation. Read the prudential standard, not the consulting summary.

Regulation

EU DORA

Digital Operational Resilience Act · enforced 17 Jan 2025

If you operate in EU financial services or are a critical ICT provider to one: the resilience-testing and concentration-risk obligations are now live. Read the RTS (regulatory technical standards) too.

Report

State of Enterprise Tech in Regulated Industries 2026

Uchit Vyas · annual report

My own — the 12-page briefing I’d give a board if asked to summarise where the regulated-industries enterprise stack stands going into 2026.

Ongoing — weekly & monthly inputs

The ongoing signal sources.

A small, ruthless list. Long enough to keep current; short enough to actually read.

Newsletter

The Pragmatic Engineer

Gergely Orosz · weekly

The reference newsletter for the senior-engineering perspective on industry events. Deep investigations on incidents and org changes.

Newsletter

Platform Engineering newsletter (PlatformCon)

Humanitec · weekly

Case studies + reference patterns. Filter for regulated-industry contributions.

Podcast

The Changelog

Adam Stacoviak & Jerod Santo · weekly

Open-source ecosystem signal. Cherry-pick interviews on tools relevant to your stack.

Podcast

All Things Distributed (Werner Vogels)

Werner Vogels · periodic posts

Long-arc thinking on distributed systems and cloud economics from a position with high signal. Not weekly; read when posted.

Conference

QCon & USENIX SREcon

Annual conferences · talks free on YouTube within 6 months

For architecture: QCon case-study tracks. For SRE: SREcon talks. Skip the keynotes; the case-study tracks are the value.

Conference

KubeCon + CloudNativeCon

CNCF · biannual (NA, EU)

The ecosystem barometer for platform / cloud-native / supply-chain projects. Read the end-user case-study track first.

Letter

Letters — monthly

Uchit Vyas · monthly via Substack

My own — one synthesis-letter a month from the work, the readings, and the field. Free, ad-free, no growth-hacks.

Updated on the working list. If you’d add or argue with a recommendation, write back — the next revision benefits.

contact@hellouchit.com →
Also on this site