I have encountered a problem with Bayesian linear regression models, which I describe in the following. I hope that someone can give me a better understand of Bayesian models or has a possible fix.
Lets say that we have two different functions (for simplicity a quadratic and linear one):
$f_1(x) = a_1 + a_2x + a_3x^2$
$f_2(x) = b_1 + b_2x$
Now, we want to infer the parameters ($b_1, b_2$) of the second function based on data of the first one by Bayesian linear regression. A small noise term is known and fixed for this example. This is my result (my matlab code is attached down below):
As can be seen the posterior distribution of both parameters is very narrow and confident. However, we see that the predictive distribution does not fit the data points very well. I would assume that the uncertainty of the parameters would be much larger in order to compensate for the model mismatch (namely the quadratic term). Does someone know why my intuition does not fit to the results and has a possible fix/workaround? Thank you.
% - 2022-02-04
%% - code
% - cleanup
clear;
close all;
clc;
% - fix random number generator
seed = 29057;
rng(seed);
% - first function (ground truth, unkown)
fcn1.p = [1, 2, 3]';
fcn1.basefcn = @(x) [ones(size(x, 1), 1), x, x.^2];
fcn1.np = length(fcn1.p);
% - second function
fcn2.basefcn = @(x) [ones(size(x, 1), 1), x];
fcn2.np = 2;
% - data points
sn2 = 1;
nd = 100;
X = sort(randn(nd, 1));
y = fcn1.basefcn(X)*fcn1.p + sqrt(sn2)*randn(nd, 1);
% - prior
M0 = zeros(fcn2.np, 1);
S0 = eye(fcn2.np);
iS = inv(S0);
% - posterior
Z = fcn2.basefcn(X);
S = inv(iS + Z'*Z/sn2);
M = S*(iS*M0 + Z'*y/sn2); %#ok<*MINV>
% - grid
grid0.n = 1000;
grid0.x = linspace(min(X), max(X), grid0.n)';
grid0.y = fcn1.basefcn(grid0.x)*fcn1.p;
grid0.ym = fcn2.basefcn(grid0.x)*M;
grid0.ys = sqrt(diag(sn2 + fcn2.basefcn(grid0.x)*S*fcn2.basefcn(grid0.x)'));
grid1.n = 1000;
grid1.min = min(M0(1)-5*sqrt(S0(1, 1)), M(1)-5*sqrt(S(1, 1)));
grid1.max = max(M0(1)+5*sqrt(S0(1, 1)), M(1)+5*sqrt(S(1, 1)));
grid1.p1 = linspace(grid1.min, grid1.max, grid1.n)';
grid1.prior = normpdf(grid1.p1, M0(1), sqrt(S0(1, 1)));
grid1.posterior = normpdf(grid1.p1, M(1), sqrt(S(1, 1)));
grid2.n = 1000;
grid2.min = min(M0(2)-5*sqrt(S0(2, 2)), M(2)-5*sqrt(S(2, 2)));
grid2.max = max(M0(2)+5*sqrt(S0(2, 2)), M(2)+5*sqrt(S(2, 2)));
grid2.p2 = linspace(grid2.min, grid2.max, grid1.n)';
grid2.prior = normpdf(grid2.p2, M0(2), sqrt(S0(2, 2)));
grid2.posterior = normpdf(grid2.p2, M(2), sqrt(S(2, 2)));
% - plot
figure(1);
subplot(2, 2, 1:2);
plot(grid0.x, grid0.y, 'g-');
hold on;
grid on;
plot(X, y, 'ko');
plot(grid0.x, grid0.ym + 2*grid0.ys, 'r-');
plot(grid0.x, grid0.ym - 2*grid0.ys, 'r-');
title('Predictive distribution');
xlabel('x');
ylabel('y');
legend('Ground truth', 'Data points', 'Predictive distribution (\mu\pm2\sigma)', 'Location', 'northwest');
subplot(2, 2, 3);
scatter(fcn1.p(1), 0, 'g', 'filled');
hold on;
grid on;
plot(grid1.p1, grid1.prior/max(grid1.prior), 'b-');
plot(grid1.p1, grid1.posterior/max(grid1.posterior), 'r-');
title('First parameter');
xlabel('b_1');
ylabel('p(b_1)');
legend('Ground truth', 'Prior', 'Posterior', 'Location', 'northwest');
subplot(2, 2, 4);
scatter(fcn1.p(2), 0, 'g', 'filled');
hold on;
grid on;
plot(grid2.p2, grid2.prior/max(grid2.prior), 'b-');
plot(grid2.p2, grid2.posterior/max(grid2.posterior), 'r-');
title('Second parameter');
xlabel('b_2');
ylabel('p(b_2)');
Edit: A follow-up post can be found here.