Including model uncertainty in non-linear least squares minimization

Question

The problem

I have experimental data $Y$ with heteroscedastic and normally distributed uncertainties characterized by covariance matrix $C_{exp}$. I want to fit the data using model $F(X, \beta)$ where $F$ is a nonlinear function mapping independent variable $X$ into theoretical curve $\hat{Y}$ using parameter vector $\beta$.

The model function $F(X, \beta)$ has the following form:

$F(X, \beta) = \beta_1S_1(X) + \beta_2S_2(X) + \beta_3f(X,\beta_4,\beta_5)$

where $\{\beta_i\}$ is the set of parameters I am trying to determine from the fitting; $S_1$ and $S_2$ are curves that were measured independently and are characterized with covariance matrices $C_1$ and $C_2$, respectively; $f(X,\beta_4,\beta_5)$ is a nonlinear theoretical function.

$C_{exp}$, $C_1$, and $C_2$ are three different full-rank positive definite matrices.

Each of the parameters in the $\{\beta_i\}$ set is important, i.e. none of them are so-called nuisance parameters.

To fit the data one needs to construct and minimize a $\chi^2$ function that normally looks like this:

$\chi^2 = (F(X,\beta) - Y)^T C_{exp}^{-1} (F(X,\beta) - Y)$

The issue with this approach is that it does not include model uncertainties - recall the curves $S_1$ and $S_2$ which have non-negligible uncertainties characterized with covariances $C_1$ and $C_2$. Therefore, the minimization of this $\chi^2$ generally will not yield correct optimal values and associated confidence intervals.

The question

How do I construct my $\chi^2$ function (cost/objective) in the light that portions of the model have uncertainty?

My proposed solution

The idea is to construct a total covariance matrix and use it in the definition of $\chi^2$. Since $S_1$ and $S_2$ go into the model additively and $\beta_1$ and $\beta_2$ are scaling parameters for these curves, the total covariance can be defined as follows:

$C_{tot} = C_{exp} + \beta_1^2C_1 + \beta_2^2C_2$

The new $\chi^2$ will be defined as follows:

$\chi^2 = (F(X,\beta) - Y)^T C_{tot}^{-1} (F(X,\beta) - Y)$

Does this look like a sound approach? Could anybody recommend some good literature on the subject of combining model and experimental uncertainty?

Thanks.

What is your preferred statistical software or programming language? — James Phillips, Jan 02 '20 at 22:03
James, python; I intend to code the minimization/fitting procedure in it, but now I am just curious about the general approach to the issue I am dealing with. Thanks — Denis L, Jan 02 '20 at 23:18
Are the three models ($S_1$,$S_2$ and $F$) estimated on different data sets? And if so why not estimate $S_1$ and $S_2$ on the same dataset used to estimate $\beta$-parameters? — Jesper for President, Jan 03 '20 at 09:45
@StopClosingQuestionsFast Could you please elaborate on how one would approach this? $S_1$ comes from a separate measurement, whereas $S_2$ comes from a probabilistic simulation (hence the uncertainty). $f$ is the nonlinear term dominating the experimental signal in $Y$. I cannot see how one can estimate both $S_1$ and $S_1$ from the same data as the model $F$ is degenerate with respect to these two terms. Also, both $S_1$ and $S_2$ have specific scaling inherent to them and the fitting must determine the $\beta_1$ and $\beta_2$ to see how much these terms actually contribute to the data. — Denis L, Jan 03 '20 at 16:19

score 4 · Answer 1 · answered Jan 03 '20 at 21:15

Consider the non-linear regression model

$$y_i = \beta^0_1 s(x_i,\lambda^0) + \beta_2^0 f(x_i,\beta^0_3) + \epsilon_i$$

where it is assumed that $s(x,\lambda)$ and $f(x,\beta_3)$ are known functions and the superscript on parameters indicates true values for the parameters. Standard identifying assumption is that

$$\mathbb E[y_i\lvert x_i] = \beta^0_1 s(x_i,\lambda^0) + \beta_2^0 f(x_i,\beta^0_3).$$

Clearly depending on the assumptions of functional form for $s$ and$f$ indetification may or may not be present. With $s=f$ it is offcourse not present. But here is an example where it works

set.seed(1)
N <- 10000
x1 <- rnorm(N)
x2 <- rnorm(N)
e <- rnorm(N)
b1 <- 1
lambda <- 0.5
b2 <- 1
b3 <- 0.1


y <- b1 * sin(lambda*x1) + b2 * exp(b3*x2) + e

g <- function(p)
    {   
        lambda <- p[1]
        b3 <- p[2]
        a1 <- sin(lambda*x1)
        a2 <- exp(b3*x2)
        model <- lm(y~a1+a2-1)
        error <- sum(model$residual^2)
        return(error)
    }

g(p=c(0.5,0.1))
nlm(g,p=c(lambda=1,b3=1),stepmax=0.3)

If this approach is not an option consider then the model where

$$\mathbb E[y_i\lvert x_i] = \beta^0_1 s(x_i) + \beta_2^0 f(x_i,\beta^0_3).$$

for the sake of example let us assume that the kind goddess of truth grant you knowledge of $\beta_3^0$ not all you have to do is calculate $f(x_i,\beta_3^0)$ and then run the linear regression

$$y_i = \beta^0_1 s(x_i) + \beta_2^0 f(x_i,\beta^0_3) + \epsilon_i$$

however you cannot do this because you do not know $s(x_i)$ but instead only some measurement output

$$\hat s_i = s(x_i) + u_i$$

where we assume that $u_i$ is independent of the true value $s(x_i)$. Pluggin $s(x_i)=\hat s_i - u_i$ into the regression gives you

$$y_i = \beta^0_1 (\hat s_i - u_i) + \beta_2^0 f(x_i,\beta^0_3) + \epsilon_i$$ implying that

$$y_i = \beta^0_1 \hat s_i + \beta_2^0 f(x_i,\beta^0_3) + \underbrace{(-\beta_1^0 u_i +\epsilon_i)}_{:=v_i}$$

where $cov(\hat s_i,v_i) \not= 0$ hence you face an endogenous regressor problem which is well known not simply to imply a wrong estimate of the variance but to imply that the OLS estimator is inconsistent. In non-linear models of the kind you seem to be suggesting estimation is also performed under minimization of the sum of squared errors under the same identifying assumption that the model correctly captures the conditional mean. Hence endogenous regressors which may very well be present in the case of measurement error potentially leads to an inconsistent estimator.

The point is that without any further assumptions (than what you have specified in your question) a formula such as

$$C_{tot} = something$$

to simply correct the variance seems wishful thinking in the sense that the problem might be worse in the form of inconsistent estimator.

Offcourse one assumption you could make is to assume $s()$ function unknown and assume that

$$\hat s_i = s(x_i) + u_i,$$

where $u_i$ is independent of $\hat s_i$ but in that case you do not need to correct the variance either.

I think the following paper provides a decent place to start Nonlinear Models of Measurement Errors by XIAOHONG CHEN and HAN HONG and DENIS NEKIPELOV.

Hi. Thanks for the input! it really helped me in developing my understanding of the problem. I was digesting the paper you shared, as well as other literature on total least squares and EIV models. I have figured out the answer and will post it soon in case if somebody else will encounter this issue. — Denis L, Jan 23 '20 at 20:20

Denis L · Accepted Answer · 2020-02-04T03:46:13.547

I have been working on this problem for a while and thanks to @StopClosingQuestionsFast suggestions and some literature research, I have managed to come up with a solution that seems to work.

So, let's restate the problem:

$\tilde{y} = y + n_y$

$\tilde{s}_1 = s_1 + n_1$

$\tilde{s}_2 = s_2 + n_2$

$y = \beta_1s_1 + \beta_2s_2 + \beta_3f(\beta_4, \beta_5)$

I am trying to fit observation $\tilde{y} \in \mathbb{R}^{n\times1}$ with noise $n_y\sim\mathcal{N}(0, C_y)$ with model $y$ which depends on parameter vector $\beta\in \mathbb{R}^{5\times1}$. Model $y$ also depends on two vectors $s_1$ and $s_2$, which are not known exactly, but observations $\tilde{s}_1$ and $\tilde{s}_2$ with perturbations $n_1\sim\mathcal{N}(0, C_1)$ and $n_2\sim\mathcal{N}(0, C_2)$ are available. The goal is to define $\beta$; however, since $s_1$ and $s_2$ are not known, it is necessary to estimate them too.

To solve the problem we will need to maximize likelihood. The likelihood function of the problem can be written as follows:

$\begin{align*} p(\tilde{y}, \tilde{s}_1, \tilde{s}_2 | \beta, s_1, s_2) = &\frac{1}{\sqrt{(2\pi)^n\det{C_y}}}\exp{( -\frac{1}{2} (\tilde{y}-y)^\intercal C_y^{-1} (\tilde{y}-y) )} \\ \times & \frac{1}{\sqrt{(2\pi)^n\det{C_1}}}\exp{( -\frac{1}{2} (\tilde{s}_1-s_1)^\intercal C_1^{-1} (\tilde{s}_1-s_1) )} \\ \times & \frac{1}{\sqrt{(2\pi)^n\det{C_2}}}\exp{( -\frac{1}{2} (\tilde{s}_2-s_2)^\intercal C_2^{-1} (\tilde{s}_2-s_2) )} \end{align*}$

Substituting the expression for $y$, negative log-likelihood can be written as

$\begin{align*} -\log{(p(\tilde{y}, \tilde{s}_1, \tilde{s}_2 | \beta, s_1, s_2))} = & \frac{1}{2}(\tilde{y}-\beta_1s_1 - \beta_2s_2 - \beta_3f(\beta_4, \beta_5))^\intercal C_y^{-1} (\tilde{y}-\beta_1s_1 - \beta_2s_2 - \beta_3f(\beta_4, \beta_5)) \\ + & \frac{1}{2}(\tilde{s}_1-s_1)^\intercal C_1^{-1} (\tilde{s}_1-s_1) \\ + & \frac{1}{2}(\tilde{s}_2-s_2)^\intercal C_2^{-1} (\tilde{s}_2-s_2) \\ + & const \end{align*}$

To find optimal values of $\beta$, $s_1$, and $s_2$ one needs to minimize negative log-likelihood (and maximize likelihood). Generally, since $f$ depends non-linearly on $\beta_4$ and $\beta_5$, the optimization problem is non-linear and should be treated numerically using appropriate algorithms. However, the problem can be substantially simplified if we first express $s_1$ and $s_2$ as a function of $\beta$ and rewrite the negative log-likelihood as a function of only $\beta$. By doing so, this will decrease the numerical overhead associated with the minimization of $-\log{(p)}$.

To find optimal $s_1$ and $s_2$ for given $\beta$, we take the following partial derivatives and set them to zero: $\frac{\partial}{\partial s_1}(-\log{(p)}) = -\beta_1C_y^{-1}(\tilde{y}-\beta_1s_1 - \beta_2s_2-\beta_3f) - C_1^{-1}(\tilde{s}_1 - s_1)=0$

$\frac{\partial}{\partial s_2}(-\log{(p)}) = -\beta_2C_y^{-1}(\tilde{y}-\beta_1s_1 - \beta_2s_2-\beta_3f) - C_2^{-1}(\tilde{s}_2 - s_2)=0$

I have ommited dependence of $f$ on $\beta_4$ and $\beta_5$ for brevity. This can be rewritten as follows:

$(\beta_1^2C_y^{-1} + C_1^{-1})s_1 + \beta_1\beta_2C_y^{-1}s_2=\beta_1C_y^{-1}(\tilde{y}-\beta_3f) +C_1^{-1} \tilde{s}_1$

$\beta_1\beta_2C_y^{-1}s_1 + (\beta_2^2C_y^{-1} + C_2^{-1})s_2 = \beta_2C_y^{-1}(\tilde{y}-\beta_3f) +C_2^{-1} \tilde{s}_2 $

Now, by stacking vectors, we can solve the equation with respect to $s_1$ and $s_2$: $\begin{pmatrix} s_1 \\ s_2 \\ \end{pmatrix}=\begin{pmatrix} \beta_1^2C_y^{-1} + C_1^{-1} & \beta_1\beta_2C_y^{-1} \\ \beta_1\beta_2C_y^{-1} & \beta_2^2C_y^{-1} + C_2^{-1} \\ \end{pmatrix}^{-1}\begin{pmatrix} \beta_1C_y^{-1} & C_1^{-1} & 0_{n\times n} \\ \beta_2C_y^{-1} & 0_{n\times n} & C_2^{-1} \\ \end{pmatrix}\begin{pmatrix} \tilde{y} - \beta_3f \\ \tilde{s}_1 \\ \tilde{s}_2 \end{pmatrix}$

This solution can be used to rewrite $-\log{(p)}$ as a function of only $\beta$ and then it can be minimized using standard numerical methods. To get the initial values to start optimization, the linear portion can be solved using generalized least squares, i.e. assuming that $\tilde{s}_1$ and $\tilde{s}_2$ have no noise. This, of course, works if you have some reasonable estimation of the non-linear portion of the parameters. In my field, I can usually say what they are within reason, so this is not a big issue.

To compute the covariance matrix of the estimated $\beta$ it is necessary to compute the inverse of hessian of negative log-likelihood with respect to $\beta$, $s_1$, and $s_2$.

Now below is a sample class written in python that implements the fitting. I have implemented the actual minimization of neg-log-likelihood using the Levenberg-Marquardt method as it seemed to be the most efficient from my testing.

In the example below I assume that I know the non-linear parameters $\beta_4$ and $\beta_5$ (they are usually approximately known in the problems I am dealing with) and the goal is to estimate their uncertainty.


import numpy as np
from matplotlib import pyplot as plt
from scipy import optimize
#%%


# non-linear partial TLS class
class NLTLS:
    def __init__(self, x, y_t, s1_t, s2_t, f_func,
                 Cy, C1, C2,
                 beta_nl_0, m=5):
        self.x = x
        self.y_t = y_t
        self.s1_t = s1_t
        self.s2_t = s2_t
        self.f_func = f_func
        self.Cy = Cy
        self.C1 = C1
        self.C2 = C2
        self.Cy_inv = np.linalg.pinv(Cy)
        self.C1_inv = np.linalg.pinv(C1)
        self.C2_inv = np.linalg.pinv(C2)
        self.Ly = np.linalg.cholesky(self.Cy_inv)
        self.L1 = np.linalg.cholesky(self.C1_inv)
        self.L2 = np.linalg.cholesky(self.C2_inv)
        self.beta_nl_0 = beta_nl_0

        self.n = x.size
        self.m = m


    def get_y(self, beta, s1, s2):
        b1, b2, b3, b4, b5 = beta
        return b1*s1 + b2*s2 + b3*self.f_func([b4, b5], self.x)


    def get_s(self, beta):
        b1, b2, b3, b4, b5 = beta
        f = self.f_func([b4, b5], self.x)
        a11 = b1**2 * self.Cy_inv + self.C1_inv
        a12 = b1 * b2 * self.Cy_inv
        a22 = b2**2 * self.Cy_inv + self.C2_inv
        A = np.linalg.pinv(np.block([[a11, a12],
                                     [a12, a22]]))
        b11 = b1 * self.Cy_inv
        b12 = self.C1_inv
        b13 = np.zeros((self.n,self.n))
        b21 = b2 * self.Cy_inv
        b23 = self.C2_inv
        B = np.block([[b11, b12, b13],
                      [b21, b13, b23]])
        z = np.block([self.y_t - b3*f, self.s1_t, self.s2_t])
        s = A @ B @ z
        return s[:n], s[n:]


    def chisq(self, beta, s1, s2):
        b1, b2, b3, b4, b5 = beta
        ry = self.y_t - b1*s1 - b2*s2 - b3*self.f_func([b4, b5], self.x)
        r1 = self.s1_t - s1
        r2 = self.s2_t - s2
        return (ry @ self.Cy_inv @ ry + 
                r1 @ self.C1_inv @ r1 + 
                r2 @ self.C2_inv @ r2)


    def resid_ols(self, beta):
        b1, b2, b3, b4, b5 = beta
        ry = self.y_t - b1*self.s1_t - b2*self.s2_t - b3*self.f_func([b4, b5], self.x)
        return self.Ly.T @ ry


    def resid_tls_beta(self, beta): # residuals only as a fucntion of beta
        b1, b2, b3, b4, b5 = beta
        s1, s2 = self.get_s(beta)
        ry = self.y_t - b1*s1 - b2*s2 - b3*self.f_func([b4, b5], self.x)
        r1 = self.s1_t - s1
        r2 = self.s2_t - s2
        return np.hstack((self.Ly.T @ ry,
                          self.L1.T @ r1,
                          self.L2.T @ r2))


    def resid_tls(self, p):
        b1, b2, b3, b4, b5 = p[:self.m]
        s1 = p[self.m : self.n+self.m]
        s2 = p[self.n+self.m:]
        ry = self.y_t - b1*s1 - b2*s2 - b3*self.f_func([b4, b5], self.x)
        r1 = self.s1_t - s1
        r2 = self.s2_t - s2
        return np.hstack((self.Ly.T @ ry,
                          self.L1.T @ r1,
                          self.L2.T @ r2))


    def gls_lin(self, q):
        b4, b5 = q
        X = np.hstack((self.s1_t[:, None],
                       self.s2_t[:, None],
                       self.f_func([b4, b5], self.x)[:, None]))
        b1,b2,b3 = np.linalg.pinv(X.T @ self.Cy_inv @ X) @ X.T @ self.Cy_inv @ self.y_t
        return [b1,b2,b3,b4,b5]


    def gls(self):
        self.beta_ols_lin = self.gls_lin(self.beta_nl_0)
        result = optimize.least_squares(self.resid_ols,
                                        self.beta_ols_lin,
                                        method='lm')
        self.beta_ols = result['x']
        self.y_ols = self.get_y(self.beta_ols, self.s1_t, self.s2_t)

        J_w = result['jac']
        H = J_w.T @ J_w
        self.C_beta_ols = np.linalg.pinv(H)

        # calculate y_ols uncertainty
        J = np.linalg.pinv(self.Ly.T) @ J_w
        self.Cy_ols = (J @ np.linalg.pinv(H) @ J.T)


    def tls(self):
        # get the starting guess for linear portion of beta
        self.beta_ols_lin = self.gls_lin(self.beta_nl_0)
        # optimize the values
        result_beta = optimize.least_squares(self.resid_tls_beta,
                                             self.beta_ols_lin,
                                             method='lm')
        # assign the results
        self.beta_tls = result_beta['x']
        self.s1_tls, self.s2_tls = self.get_s(self.beta_tls)
        self.y_tls = self.get_y(self.beta_tls, self.s1_tls, self.s2_tls)

        # estimate uncertainties for beta_tls - done separately for efficiency
        q_all = np.hstack((self.beta_tls, self.s1_tls, self.s2_tls))
        result_all = optimize.least_squares(self.resid_tls, q_all, method='lm')
        J_w = result_all['jac'] 
        C_all = np.linalg.pinv(J_w.T @ J_w)

        self.C_beta_tls = C_all[: self.m, : self.m]
        self.C1_tls = C_all[self.m : self.m+self.n,
                            self.m : self.m+self.n]
        self.C2_tls = C_all[self.m+self.n :,
                            self.m+self.n :]

        # calculate y_tls and its uncertainty
        Ly12_inv = np.hstack((np.linalg.pinv(self.Ly.T),
                               np.linalg.pinv(self.L1.T),
                               np.linalg.pinv(self.L2.T)))
        J = Ly12_inv @ J_w
        self.Cy_tls = J @ C_all @ J.T


    def pretty_print(self, beta_true):
        dof = self.n-self.m
        print('Printing TLS results')
        print('1. MSE')
        print('TLS\tGLS')
        print('%0.3f' %(self.chisq(self.beta_tls, self.s1_tls, self.s2_tls)/dof), 
              '\t%0.3f' %(self.chisq(self.beta_ols, self.s1_t, self.s2_t)/dof))
        print('')
        print('2. Best fit parameters w/ 1-sigma errors')
        print('\ttrue\t TLS\t\t\t OLS')
        for i in range(self.m):
            print('b'+str(i+1), '\t%0.3f' %beta_true[i], '\t',
                  '%0.3f' %self.beta_tls[i], '+/-',
                  '%0.3f' %np.sqrt(self.C_beta_tls[i, i]), '\t',
                  '%0.3f' %self.beta_ols[i], '+/-',
                  '%0.3f' %np.sqrt(self.C_beta_ols[i, i]))


    def pretty_plot(self, y_true, s1_true, s2_true, plot_ols=True):
        plt.figure()
        plt.clf()

        plt.subplot(311)
        plt.plot(self.x, y_true, 'b-', label='true')
        plt.plot(self.x, self.y_t, 'k.-', label='data')
        plt.plot(self.x, self.y_tls, 'r-', label='TLS')
        if plot_ols:
            plt.plot(self.x, self.y_ols, 'r:', label='OLS')
        plt.legend()
        plt.xlabel('x')
        plt.ylabel('y')

        plt.subplot(312)
        plt.plot(self.x, s1_true, 'b-', label='true')
        plt.plot(self.x, self.s1_t, 'k.-', label='data')
        plt.plot(self.x, self.s1_tls, 'r-', label='TLS')
        plt.legend()
        plt.xlabel('x')
        plt.ylabel('s1')

        plt.subplot(313)
        plt.plot(self.x, s2_true, 'b-', label='true')
        plt.plot(self.x, self.s2_t, 'k.-', label='data')
        plt.plot(self.x, self.s2_tls, 'r-', label='TLS')
        plt.legend()
        plt.xlabel('x')
        plt.ylabel('s2')



if __name__ == '__main__':


    # Generate true x, s1, s2, f, y, beta
    b1, b2, b3 = -3, 4.5, 1 
    b4, b5 = 2, 0.5

    beta_true = [b1, b2, b3, b4, b5]

    def f_func(beta, x):
        return np.sin(beta[0]*x + beta[1])

    n = 201 # number of points in x
    x = np.linspace(0, 10, n)
    s1_true = np.exp( - ((x-2)**2) / (np.sqrt(2)*1)**2)
    s2_true = np.exp( - ((x-8)**2) / (np.sqrt(2)*0.5)**2)
    f_true = f_func([b4, b5], x)
    y_true = b1*s1_true + b2*s2_true + b3*f_true

    # Generate covariance matrices for the data
    np.random.seed(1)

    m = 10000
    N_y = np.random.randn(n, m)*1 + (x[:, None]-2)**2 * np.random.randn(1, m)
    Cy_data = np.cov(N_y)/m

    N_1 = np.random.randn(n, m)*1 + (x[:, None]-2)**3 * np.random.randn(1, m)/10
    C1_data = np.cov(N_1)/m

    N_2 = np.random.randn(n, m)*1 + (x[:, None]-8)**3 * np.random.randn(1, m)/10
    C2_data = np.cov(N_2)/m

    # generate data
    def generate_data(a_list, C_list):
        d_list = []
        for a, C in zip(a_list, C_list):
            L = np.linalg.cholesky(np.linalg.pinv(C))
            noise = np.linalg.pinv(L.T) @ np.random.randn(a.size)
            d = a + noise
            d_list.append(d)
        return d_list


    y_data, s1_data, s2_data = generate_data([y_true, s1_true, s2_true],
                                             [Cy_data, C1_data, C2_data])

    # Solving the problem
    nltls = NLTLS(x, y_data, s1_data, s2_data, f_func,
                     Cy_data, C1_data, C2_data,
                     [b4, b5])
    nltls.gls()
    nltls.tls()

    nltls.pretty_print(beta_true)
    nltls.pretty_plot(y_true, s1_true, s2_true)

To quickly check whether the likelihood of the parameters obtained from the fitting follows normal distribution I ran the fitting on 100 different datasets and checked whether results fall within standard confidence intervals that correspond to 1,2, and 3 sigmas (a bit clumsy, but works):

# Check if predicted CI are correct
    def is_within_123_sigma(z_true, z_est, Cz):
        return np.abs(z_est - z_true) < np.array([1,2,3])[:, None]*np.sqrt(np.diag(Cz))

    def test_ci(k=1):
        is_within_ols = np.zeros((k, 3, 5), bool)
        is_within_tls = np.zeros((k, 3, 5), bool)
        for i in range(k):
            print('Progress: ', i+1, '/', k)
            y_data, s1_data, s2_data = generate_data([y_true, s1_true, s2_true], [Cy_data, C1_data, C2_data])
            nltls = NLTLS(x, y_data, s1_data, s2_data, f_func,
                             Cy_data, C1_data, C2_data,
                             [b4, b5])
            nltls.gls()
            nltls.tls()
            is_within_ols[i, :, :] = is_within_123_sigma(beta_true, nltls.beta_ols, nltls.C_beta_ols)
            is_within_tls[i, :, :] = is_within_123_sigma(beta_true, nltls.beta_tls, nltls.C_beta_tls)
        return is_within_ols, is_within_tls

    h=100
    is_within_ols, is_within_tls = test_ci(k=h)

    def print_ci_results(method_str, is_within):
        print('% of beta containing true value within x-sigma for '+method_str)
        print('par\t1-sigma\t2-sigma\t3-sigma')
        for i in range(5):
            bla = ['%.1f' %j for j in np.sum(is_within_tls[:,:,i], axis=0)/h*100]
            print('b'+str(i+1), end='\t')
            for j in bla: print(j, end='\t')
            print('')

    print_ci_results('GLS', is_within_ols)
    print_ci_results('TLS', is_within_tls)

Below is the resulting table showing the percentage of the trials where true value was within the corresponding confidence level. The results seem to be pretty okay, especially at 95% confidence level:

par     1-sigma 2-sigma 3-sigma
b1      66.0    96.0    100.0   
b2      78.0    96.0    99.0    
b3      68.0    95.0    100.0   
b4      70.0    98.0    99.0    
b5      67.0    96.0    99.0

I have implemented a more general class dealing with an arbitrary number of inputs $s$, multiple $y$'s and complex covariance structure for $y$. I will share a git repository for this soon, once I clear up the code a bit.

I hope this will help whoever will encounter similar problems in the future.

Additional reading:

A good paper with detailed derivations for partial linear TLS:

https://link.springer.com/article/10.1007/s00190-012-0552-9

Good paper dealing with linear TLS with correlated errors:

https://arc.aiaa.org/doi/abs/10.2514/6.2019-1931

Papers by Alireza Amiri-Simkooei and his collaborators were really helpful

Any chance you could host the code somewhere (e.g. GitHub) with some documentation/instructions? — Sterling, Mar 26 '21 at 21:47

Including model uncertainty in non-linear least squares minimization

2 Answers2

Linked