Question: after throwing a die a large number of times and discovering that the average of the outcomes is $4$, what probability distribution one should assign to statements "the next roll will be $i$" for $i = 1, 2, \dots, 6$?
E.T. Jaynes in Chapter 9 of the book "Probability Theory: The Logic of Science" derives the following:
if we start with ignorance knowledge, $I_0$, meaning that all individual rolls are independent and are equally likely to occur, so $1/6$ for each outcome, then we just need to find $(p_1, p_2, \dots, p_6)$ that maximises $H(p_1, p_2, \dots, p_6) := \sum_{i=1}^6 -p_i \log p_i$ subject to $\sum_{i=1}^6 i p_i = 4$.
Utilising Lagrange multipliers one can easily derive Boltzmann posterior distribution for such a die. My posterior distribution, found numerically, is the following:
$$(p_1, p_2, p_3, p_4, p_5, p_6) \approx (0.10,0.12,0.15,0.17,0.21,0.25).$$
Moreover E.T. Jaynes advocates that such posterior distribution is the only answer consistent with prior knowledge $I_0$, the data $D=\{\text{the mean is $4$}\}$ and Cox's theorem. However, I have a few questions about such posterior:
1) Qualitative: does it really do what common sense dictates it should do? Why the posterior doesn't have more mass on $4$?
2) Why the mode of the posterior is $6$ rather than $4$? Under what loss function should I guess 4?
3) Why the MLE approach fails to give the mode of the posterior distribution, despite the following quote:
A maximum likelihood estimator coincides with the most probable Bayesian estimator given a uniform prior distribution on the parameters. (Wiki: MLE).
P.S. Haskell code used for finding the answer:
l = concat $ map (\a -> [(-a), a]) l'
where
l' = map (/ 1000) [1.. ]
findExp xs = sum $ zipWith (*) [1..] xs
entropy xs = sum $ map (\a -> - a * (log a)) xs
probs lam = map (/ sum probs') probs'
where
probs' = map (\a -> exp (-lam * a)) [1..6]
expectation = 4.0
condition lam = abs (findExp (probs lam) - expectation) <= 0.05
main = print dist >> print (findExp dist) >> print (entropy dist)
where
lamda = head (dropWhile (not . condition) l)
dist = probs lamda