I am trying to understand the bias-variance trade-off in the context of non-parametric entropy estimation.
Specifically using a histogram approach to estimate the entropy of a sample we have:
$$\hat{H} = - \sum^{B}_{i=1}\hat{p}_iv_i\log(\hat{p}_i) $$
(for a generic partition of $B$ bins where $\hat{p}_i$ is the estimate of probability density in bin $i$ and $v_i$ is the volume of the bin $i$).
I understand that bias is defined as: $E[\hat{H}] - H$, where $H$ is the 'true' entropy, but in the general case one doesn't know the true population distribution (hence the nonparametric estimation). Hence I don't understand how bias is calculated in general for this estimator, nor how it changes with the parameter $B$?
Secondly, the variance is given by: $E[E[\hat{H}] - \hat{H}]$, which avoids the above problem but how does the expectation value $E[\hat{H}]$ even differ from the estimator $\hat{H}$?
Finally does it even make sense to consider variance outside the context of 'training' the histogram estimator - if you assume there is just one sample and one is trying to get a close estimate to the true value, it doesn't feel like over-fitting is a concern and one should aim for the minimum-bias parameterisation.
I think I am missing some really basic context here as these simple concepts are not making sense to me.