DeepMind partnered with a group at Harvard to observe dopamine neuron behavior in mice. They set the mice on a task and rewarded them based on the roll of dice, measuring the firing patterns of their dopamine neurons throughout. They found that every neuron released different amounts of dopamine, meaning they had all predicted different outcomes. While some were too "optimistic," predicting higher rewards than actually received, others were more "pessimistic," lowballing the reality. When the researchers mapped out the distribution of those predictions, it closely followed the distribution of the actual rewards. This data offers compelling evidence that the brain indeed uses distributional reward predictions to strengthen its learning algorithm.