I might not have fully understood these concepts, and I am confused about how standard error is calculated. Here are my understandings and confusions, let me know where went wrong.
EDIT: I was taking about the hessian matrix output from R optim.
Standard error of an parameters $\theta$, is the standard deviation of its estimates, var$(\hat\theta)^{1/2}$. I've read that one should calculated it from the expected information matrix E$[I]^{-1/2}$ which is E$[-H]^{-1/2}$. I assume to get the Expected Hessian matrix I need to run my maximum likelihood program multiple iterations to get multiple hessian matrices. But why can't we just calculate the SD simply from taking sd($\hat\theta$), given we already have a handful amount of estimates $\hat\theta$? Are the results going to be different?
Same question on calculating the confidence interval of a parameter. For example for 95% CI, the standard way seems to be calculate from $1.96\cdot E[-H]^{-1/2}$. Is it different from just run a handful amount of iterations to get a lot of estimates $\hat\theta$, and find where 95% of them fall? Is one more accurate given the same amount of realizations?