The analytic gradients of log multivariate normal distribution wrt mean and covariance matrix can be found at StackExchange post and The gradient of the log-likelihood of normal distributions.
I tried to use numerical differentiation (e.g., forward difference) to obtain the gradients by the following MATLAB code:
% analytic solution
mu = [1 1];
Cov = eye(2);
Cov(1,2) = 0.0; Cov(2,1) = Cov(1,2);
x = [1 2];
CovInv = inv(Cov);
x_m = x-mu;
mu_a = x_m*CovInv; % gradient of mean
cov_a = -0.5*(CovInv - CovInv*x_m'*x_m*CovInv); % gradient of covariance
% numerical solution (e.g., forward difference)
h = sqrt(eps);
prob = @(x,mu,Cov) exp(-0.5*(x-mu)/Cov*(x-mu)')/sqrt((2*pi)^2*det(Cov));
mu_n = nan(1,2); % for mean
p = log(prob(x,mu,Cov));
mu2 = mu; mu2(1) = mu2(1) + h; p2 = log(prob(x,mu2,Cov));
mu_n(1) = (p2 - p)/h;
mu2 = mu; mu2(2) = mu2(2) + h; p2 = log(prob(x,mu2,Cov));
mu_n(2) = (p2 - p)/h;
cov_n = nan(2,2);
Cov2 = Cov; Cov2(1,1) = Cov2(1,1) + h; p2 = log(prob(x,mu,Cov2));
cov_n(1,1) = (p2 - p)/h;
Cov2 = Cov; Cov2(1,2) = Cov2(1,2) + h; Cov2(2,1) = Cov2(2,1) + h; p2 = log(prob(x,mu,Cov2));
cov_n(1,2) = (p2 - p)/h; cov_n(2,1) = cov_n(1,2);
Cov2 = Cov; Cov2(2,2) = Cov2(2,2) + h; p2 = log(prob(x,mu,Cov2));
cov_n(2,2) = (p2 - p)/h;
mu_n
is very close to mu_a
, as well as for cov_n
and cov_a
.
However, changing the covariance from 0
, Cov(1,2)=0.5; Cov(2,1)=Cov(1,2);
for example, the cov_n(1,2)
is always nearly twice larger than cov_a(1,2)
.
>> cov_n(1,2)/cov_a(1,2)
ans = 2.0000
It also happens when means are unequal, mu = [-1 0];
.
What's wrong in this case?
If I still want to use numerical difference, what can I do?
Continue above question: About the Hessian matrix, how can I make sure the the numerical differentiation gives correct answer?