6

I use matlab a lot to fit functions to datasets. I then need to determine some kind of error estimate on the fitted parameters.

Say I have a dataset of $N$ points which is given by the points $(x_i,y_i),\ i=1,\ldots,N$. I need to fit a function $f(x,P)$ where $P$ is the parameter matrix

$$P = \left[\begin{array}{cc} a_1\\ a_2\\\vdots\\a_k\end{array}\right]$$

where $k$ is the number of parameters to be fitted. I use the root-mean-squared method to find estimates for parameters $a_1,\ldots,a_k$. I build an error function:

$$\varepsilon(P,x,y)= \frac{1}{N}\left(\sum\limits_{i=1}^N (f(x,P)-y)^2\right)^{\frac{1}{2}}$$

and then use the matlab function fminsearch to get the minimum of this function by optimizing $P$. (I know there are better matlab functions, but bare with me).

To get the estimates I calculate the Jacobian matrix $J_f$:

$$J_f = \left[\begin{array}{c}\frac{\partial f}{\partial a_1}(P',x) \\ \frac{\partial f}{\partial a_2}(P',x)\\ \vdots\end{array}\right]$$

where $P'$ denotes the fitted parameter matrix (best values to minimize the error function $\varepsilon$). I also calculate the variance on the residuals:

$$ \sigma_r = \frac{1}{N-k}\sum\limits_{i=1}^N r_i^2$$

with

$$ r_i = y_i - f(P',x_i), $$

the residuals. I can get the covariance matrix now:

$$ \operatorname{cov}(P) = \sigma_r(J_f^TJ_f)^{-1} $$

where the upper script $T$ stands for transposing and upper script $-1$ stands for the inverse matrix. TO get the error estimates I then need to get the point-wise square root of the covariance matrix:

$$s_P = \left[ \begin{array}{c} s_{a_1} \\ s_{a_2} \\ \vdots \end{array}\right] = \sqrt{\operatorname{cov}(P)}$$

The Question

Given the above method is at least a bit correct (I assume it is). Is there a similar approach for the case where not one function $f$ is fitted, but when there are multiple functions $f_1$ and $f_2$?

I'll explain what I mean by this. Let's say I have two data sets $(x_i,y_i)$ and $(x'_i,y'_i)$ which comply respectively to functions $f(x,P)$ and $f'(x',P)$ where $P$ is the parameter matrix. While $f$ and $f'$ are different functions, $P$ should be the same for both fits. Eg $f$ might be:

$$f(x,P)=a_1(1-exp(-x/a_2)exp(x/a_3)$$

while $f'$ is:

$$f'(x',P)=a_4(1+exp(x'/a_2))exp(x'/a_4)$$

I would therefore fit simultaneously with the error function:

$$\varepsilon(P,x,y) = \frac{1}{N}\left(\sum\limits_{i=1}^N (f(x,P)-y)^2\right)^{\frac{1}{2}} + \frac{1}{N'}\left(\sum\limits_{i=1}^N (f'(x',P)-y')^2\right)^{\frac{1}{2}}$$

and using fminsearch again. This gives me the best estimate for $P$ using all data available (I actually have 4 datasets and 2 functions). How would I estimate the error/certainty bounds on the parameters in this case? (Preferrably using Jacobian again).

My english isn't top-notch, nor is my knowledge of statistics and therefore I may be mixing nomenclature a bit. Forgive me for this.

romeovs
  • 203
  • 2
  • 6

1 Answers1

3

The easiest thing for you to do would be to string the data sets together (rbind() in R jargon, append in Stata jargon), so that you have variables $x_1,y_1$ coming from the first data set, $x_2,y_2$ coming from the second data set, and an identifier $I$ that takes the value of 0 for the first data set and the value of 1 for the second data set. Construct your response variable $r=y_1 (1-I) + y_2 I$ that would correspond to $y_1$ or $y_2$ depending what data set a particular record came from. Likewise, construct the expanded fit function $f=f(x_1,P) (1-I) + f'(x_2,P) I$ . Then treat the whole thing as if it were a single data set.

Example: the first data set is

    x     y
    2.3   5.6
    1.7   4.5
    0.8   2.2

The second data set is

     x     y
     11.0  25.4
     8.3   21.2
     7.5   19.3

The combined data set is

    x1     y1    x2   y2    I   r 
    2.3   5.6     0    0    0   5.6
    1.7   4.5     0    0    0   4.5
    0.8   2.2     0    0    0   2.2
    0     0    11.0  25.4   1  25.4
    0     0     8.3  21.2   1  21.2
    0     0     7.5  19.3   1  19.3

I would expect your standard errors and confidence intervals to be wrong with this method though. You assume implicitly that the variance of the error terms is constant (a homoskedastic nonlinear regression model), and that is a strong assumption to make. You would be better off with a so-called sandwich formula, for which you need the second derivatives, as well. You can probably find all the theory in Ron Gallant's book or in Bates and White's book, but I doubt you'd be able to handle these references. Bruce Hansen's lecture notes (Ch. 6 in particular) might be helpful, too.

StasK
  • 29,235
  • 2
  • 80
  • 165
  • Well, it is a given fact that each data set on it's own would be homoskedastic. Stringing them together ruins this of course. I'm going to take a look at these sandwich formulas you propose. – romeovs Dec 28 '11 at 09:29
  • Can't find any good sources on nonlinear sandwich formula (only simple linear regression). Is there some kind of readymade "follow these steps" method? – romeovs Dec 28 '11 at 19:30
  • Hm. I wouldn't take anything about my data for granted unless I have simulated them :). I added another reference, which is written in a somewhat lighter tone (although is still fully rigorous). – StasK Jan 03 '12 at 14:51
  • Yeah well, it's just a report I have to make for college. It's always "ok" to assume this for the data we get (it says so in the assignment). In later life I will be more cautious! – romeovs Jan 03 '12 at 15:27
  • I see. I wonder what subject it is in? Statisticians would just run a linear regression and be happy with it. Nonlinear regressions appear sometimes in finance, sometimes in chemical or physical applications. Cannot think of many other fields, actually. – StasK Jan 03 '12 at 16:28
  • Physics actually, measurement of rotational fluorescence decay of polarized sub nanosecond measurements. Since the fit models are $\exp(-a/x)(1-b\exp(-c/x))$ and $\exp(-a/x)(1+2b\exp(-c/x))$, I don't think I can do linear regression. – romeovs Jan 03 '12 at 20:27
  • I see. Well, I would expect that as you are reaching the asymptote, variability in your data around the regression line decreases: if at time zero you have values around 1 with s.d. of say 0.1, then at time $3(a+c)^{-1}$ you will have values around around 0.03 with s.d. of 0.002, say, and that's a pretty serious heteroskedasticity. But (1) if it is difficult for you to implement properly, (2) this is not a part of your assignment, and (3) physicists don't bother with that, anyway -- you don't have to address it. As far as (3) concerned, though, expect that the results will not be reproducible. – StasK Jan 04 '12 at 14:59
  • Yeah, well. Third bachelor experimental physics is about the experiment mainly. I just try to get the data processing at least partly correct (if that doesn't take me too far, is it would in this case). After my masters though, I'd like to do some extra studying, to get my mathematical skills a bit better (the physics I'm interested in needs correct maths, not physics maths, plus maths is cool!), I'll also look into some more serious statistical courses.. – romeovs Jan 04 '12 at 16:25
  • Good luck with your studies! I had this kind of stuff in a graduate nonlinear econometrics class with aforementioned Ron Gallant, rather than in a statistics class, though. – StasK Jan 07 '12 at 16:39
  • Yes, it seems good statistics is always found in the softer sciences curricula. I've found one in Belgium (where I live) though that isn't! A master after Master in Statistics. – romeovs Jan 07 '12 at 17:38
  • Most stat students I know would weep over Ron Gallant's class, believe me. I looked at a more serious one he offered two or three years after I took his nonlinear econometrics, and I was scared to death to take it :). This is a guy who taught measure theory to undergraduate economics students because he felt they were not happy with his rigour. A very curious figure! – StasK Jan 09 '12 at 23:05