I understand how mutual information is calculated, and what it is addressing: how much the distribution of one variable changes conditional on the value of another variable. But I don't really understand what the output values of a mutual information calculation actually mean an an absolute sense. I know that 0 means that the variables are independent, and I know I can use those values in relative comparisons for feature selection without really going any deeper than this, but it'd still like to try and understand what the absolute values mean. For example (using python with this MI implementation):
$$X = U(0,1);\ \ n=10000\\ Y = X + U(0, 0.5)\\ MI(X, Y) \approx 0.92 $$
what does it mean that $MI(X,Y)=$0.92 nits?. Is this value actually related to the maximum compressibility of the data, or of the relationship between the variables? Or is it something else entirely?