Yes, the actual LLM returns a probability distribution, which gets sampled to pr...

mr_toad · 2026-03-01T12:51:59 1772369519

It’s often very difficult (intractable) to come up with a probability distribution of an estimator, even when the probability distribution of the data is known.

Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.

podnami · 2026-03-01T09:41:13 1772358073

What happens before the probability distribution? I’m assuming say alignment or other factors would influence it?

DavidSJ · 2026-03-01T09:47:00 1772358420

In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.

But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.