Architecture teardown

Claude.ai — the streaming-first AI product.

Anthropic builds Claude.ai with the same engineering culture that publishes research papers. That culture leaks into the architectural shape via blog posts, the MCP spec, job ads, and the product itself. Enough signal to draw a defensible map.

ProductClaude.ai (Anthropic)

Signal densityDense — eng blog · MCP spec · job ads · live product

Stack (inferred)Next.js · Cloudflare · Postgres · S3/R2 · custom AI gateway · multi-region inference

PatternPlane separation · streaming-first · async safety telemetry

The architecture — from public signals

Core component (the architectural bet) Standard service Async / inferred dataflow

The five things you can see from outside.

Plane separation

Application plane (conversations · projects · files · billing) and AI plane (model · tools · safety) deploy independently and scale independently. The single most-copied pattern in 2026 enterprise GenAI — and the one most often badly imitated.

Streaming is non-blocking

Sub-100ms first-token from most regions. Achieved by minimising inline safety: a fast refusal layer in the hot path; the deep classifier runs asynchronously on completed conversations.

Multi-model is routed

Sonnet · Opus · Haiku selected per-conversation, switchable mid-thread. The application doesn't “know” which model it's talking to. Model lineup changes without an application redeploy.

MCP runtime is a service tier

Tools and connectors (Gmail · GitHub · Drive · custom) are runtime entities, not hard-coded plugins. New MCP servers come online via configuration. Open spec; Anthropic eats their own dogfood.

Per-request audit by default

Audit isn't an enterprise upsell — it's the substrate. Plan tier controls retention and admin-API access; the recording happens for every conversation.

What to steal. What to avoid copying.

Read these together — the same pattern can be right for one team and wrong for another.

Steal — the patterns that compound

Plane separation — if you have 2+ GenAI use-cases, model routing + tooling + evals live in a separate tier. Adding a use-case becomes config, not architecture.
Async safety + inline refusal — don't put a 500ms classifier in your streaming hot path. Refusal categories + PII scrub inline; deep classification offline.
Context assembler beats “everything is RAG” — if your context fits in 100K tokens, stuff it. Reach for RAG only on overflow. (Pattern decision tree.)
MCP / tool runtime as a service tier — not hard-coded plugins. The agent landscape converges on this in 2026.
Per-decision audit by default — not an upsell; it's the substrate.

Avoid copying — unless you're them

Building your own model — the point of plane separation is using someone else's. Sovereign-fine-tuned-everything is the most common waste of GenAI investment.
Sandboxed code execution as a first feature — Anthropic has the team to operate WebContainer-class sandboxes. Most teams shouldn't accept the operational risk.
Per-conversation T&S classification pipeline — built for frontier-lab safety research. Most enterprise GenAI needs the nine-controls discipline, not a parallel classification stack.
Custom AI gateway — build only if you're a frontier-lab. Otherwise buy.

Thesis

Claude.ai is what a streaming-first conversational AI product looks like when the team building it also builds the model. The rest of the architecture — plane separation, async safety, MCP runtime, per-decision audit — is replicable, and is the right pattern for any team shipping enterprise GenAI in 2026.

Methodology & sources

Public signals only. Company engineering blog posts, conference talks, job ads, public GitHub, podcasts, product behaviour from the outside. No NDA-covered information; no private conversations. The architecture inferred is my analysis — not endorsed by or affiliated with Claude.ai (Anthropic).

Primary sources:

Anthropic engineering blog · anthropic.com/news
Anthropic API + Claude.ai documentation · docs.anthropic.com
Model Context Protocol specification · modelcontextprotocol.io
Anthropic public job listings (infra · trust & safety · platform)
Direct observation of streaming behaviour, regional latency, artifact rendering, connector lifecycle.
Conference talks: AWS re:Invent · Lex Fridman podcast appearances by Anthropic team.

Found this useful? More coming.

One teardown per quarter. Tell me which architecture you'd want analysed next.

All teardowns → Subscribe to Letters Suggest the next teardown