WaveFunctionLabs

Transformer

Superposition: Every position holds a distribution over the full vocabulary — all possible continuations exist simultaneously.

Observation: Attention selectively observes the context, collapsing relevance weights from uniform to sharply peaked.

Collapse: Softmax + sampling forces one token to be chosen. The wavefunction collapses. A single reality is selected from the probability field.

Diffusion

Superposition: Pure Gaussian noise is the maximum entropy state — every possible image exists as a superposition in the noise.

Observation: The text condition acts as the observer, constraining which images are consistent with the prompt.

Collapse: Each denoising step reduces entropy. The noise field collapses toward a single coherent image. T steps of gradual wavefunction collapse.

The Shared Pattern

Transformer — field of possible tokens → attention constrains → one token collapses out
Diffusion — field of possible images → conditioning constrains → one image collapses out
Wavefunction — field of possible states → observation constrains → one reality collapses out

Both start in a high-entropy state (all possibilities). Both apply constraints (attention/conditioning). Both collapse to a single output (token/image). The mechanism differs. The structure is identical.

Transformer Collapse

Discrete. One token at a time.
Autoregressive: each collapse feeds the next.
Fast collapse — one forward pass per token.
The constraint is context.

Diffusion Collapse

Continuous. The entire image at once.
Iterative: each step refines the whole field.
Slow collapse — 20-50 denoising steps.
The constraint is the prompt.

Why This Matters

If you see AI models as "clever prediction machines," you'll use them like tools. If you see them as constrained collapse systems, you'll understand:

Why prompt engineering works — you're shaping the constraint field
Why temperature matters — it controls collapse sharpness
Why hallucinations happen — underconstrained regions collapse to plausible-but-wrong states
Why guidance scale works — it amplifies the observer's influence
Why longer context helps — more constraints, tighter collapse
Why these models feel creative — they explore the possibility space before collapsing

A Transformer is a discrete sequential collapse engine.
A Diffusion model is a continuous parallel collapse engine.
Both are wavefunctions. The prompt is the observer.

The architecture is different. The computation is different. The maths is different. But the shape is the same: a field of possibilities, constrained by observation, collapsing into reality.