Claude.ai — the streaming-first AI product.
Anthropic builds Claude.ai with the same engineering culture that publishes research papers. That culture leaks into the architectural shape via blog posts, the MCP spec, job ads, and the product itself. Enough signal to draw a defensible map.
The architecture — from public signals
The five things you can see from outside.
Plane separation
Application plane (conversations · projects · files · billing) and AI plane (model · tools · safety) deploy independently and scale independently. The single most-copied pattern in 2026 enterprise GenAI — and the one most often badly imitated.
Streaming is non-blocking
Sub-100ms first-token from most regions. Achieved by minimising inline safety: a fast refusal layer in the hot path; the deep classifier runs asynchronously on completed conversations.
Multi-model is routed
Sonnet · Opus · Haiku selected per-conversation, switchable mid-thread. The application doesn't “know” which model it's talking to. Model lineup changes without an application redeploy.
MCP runtime is a service tier
Tools and connectors (Gmail · GitHub · Drive · custom) are runtime entities, not hard-coded plugins. New MCP servers come online via configuration. Open spec; Anthropic eats their own dogfood.
Per-request audit by default
Audit isn't an enterprise upsell — it's the substrate. Plan tier controls retention and admin-API access; the recording happens for every conversation.
What to steal. What to avoid copying.
Steal — the patterns that compound
- Plane separation — if you have 2+ GenAI use-cases, model routing + tooling + evals live in a separate tier. Adding a use-case becomes config, not architecture.
- Async safety + inline refusal — don't put a 500ms classifier in your streaming hot path. Refusal categories + PII scrub inline; deep classification offline.
- Context assembler beats “everything is RAG” — if your context fits in 100K tokens, stuff it. Reach for RAG only on overflow. (Pattern decision tree.)
- MCP / tool runtime as a service tier — not hard-coded plugins. The agent landscape converges on this in 2026.
- Per-decision audit by default — not an upsell; it's the substrate.
Avoid copying — unless you're them
- Building your own model — the point of plane separation is using someone else's. Sovereign-fine-tuned-everything is the most common waste of GenAI investment.
- Sandboxed code execution as a first feature — Anthropic has the team to operate WebContainer-class sandboxes. Most teams shouldn't accept the operational risk.
- Per-conversation T&S classification pipeline — built for frontier-lab safety research. Most enterprise GenAI needs the nine-controls discipline, not a parallel classification stack.
- Custom AI gateway — build only if you're a frontier-lab. Otherwise buy.
What this teardown can't tell you
- The actual GPU economics — inference batching, cache hits, prefill/decode separation.
- How the safety telemetry feeds back into model training (existence is public; cadence + gating isn't).
- Extended-thinking trace storage — how it's deduplicated, expired, or summarised.
- Per-region inference pool sizing + failover topology.
Claude.ai is what a streaming-first conversational AI product looks like when the team building it also builds the model. The rest of the architecture — plane separation, async safety, MCP runtime, per-decision audit — is replicable, and is the right pattern for any team shipping enterprise GenAI in 2026.
Methodology & sources
Public signals only. Company engineering blog posts, conference talks, job ads, public GitHub, podcasts, product behaviour from the outside. No NDA-covered information; no private conversations. The architecture inferred is my analysis — not endorsed by or affiliated with Claude.ai (Anthropic).
Primary sources:
- Anthropic engineering blog · anthropic.com/news
- Anthropic API + Claude.ai documentation · docs.anthropic.com
- Model Context Protocol specification · modelcontextprotocol.io
- Anthropic public job listings (infra · trust & safety · platform)
- Direct observation of streaming behaviour, regional latency, artifact rendering, connector lifecycle.
- Conference talks: AWS re:Invent · Lex Fridman podcast appearances by Anthropic team.
Found this useful? More coming.
One teardown per quarter. Tell me which architecture you'd want analysed next.