What is the error of my regression?

Question

I'm conducting a polynomial of a third degree upon a diode measurement where Amplification was measured against Voltage. It's a very exponential behavior. However, I used the lm()-fit in R to finally get the corresponding voltage of a given amplification value.

Is there a way to determine the uncertainty or error (in voltage)?

Additional: Voltage is the explanatory variable but I need to conduct an inverse regression as the amplification is given. I apply a polynomial because another models did not work well and it is in principle quite good nevertheless. However, the data is transformed to a double-logarithmic scale and only a limited region is taken into account (~several data points which are rather close to each other). Some background available.

Because your tags conflict with the statement of the question, could you please clarify what you are asking? Is voltage the response or is it the explanatory variable in your regression? It would also be nice to understand why you are using a cubic to model "very exponential behavior:" that seems destined to be a poor model. — whuber, Jan 20 '19 at 15:43
Uncertainty of error... as in standard error of the linear model coefficients? — user2974951, Jan 21 '19 at 09:01
[This search of our site](https://stats.stackexchange.com/search?q=calibration+regression+-logistic+-model+score%3A1) turns up some relevant threads, especially https://stats.stackexchange.com/questions/206531 and https://stats.stackexchange.com/questions/169979 (which provides some references). — whuber, Jan 21 '19 at 15:41
To be honest, I'm not sure how to reference the error. I can only basically tell what my intention is: I need to know a certain voltage and I would like to know how much I fail at doing this :) But yes, I think it is the error of the coefficients finally? — Ben, Jan 23 '19 at 07:07

JJacquelin · Answer 1 · 2019-01-23T14:40:11.033

$$A=\frac{1}{1-\left(\frac{V}{\theta_0} \right)^{\theta_1}}+\theta_2 \tag 1$$ The inverse function is : $$V=\theta_0\left(1-\frac{1}{A-\theta_2} \right)^{1/\theta_1} \tag 2$$ The problem is to evaluate $\theta_0$ , $\theta_1$ and $\theta_2$, given the data $$(A_1,V_1)\:,\: (A_2,V_2)\:,\:...\:,\:(A_k,V_k)\:,\:...\:,\:( (A_n,V_n).$$ This is a problem of non-linear regression. Some usual methods can be found in the litterature. They require iterative numerical calculus, starting from guessed values of the sought parameters.

These methodes are not always reliable if the initial values of the parameters are too far from the true (unknown) values. That is a major difficulty in practice.

A NOT ITERATIVE METHOD thanks to linearization :

The advantage is that no initial values of the parameters are necessary to be guessed. The drawback is that replacing the function by another one, even with a very good proximity, introduces some small but systematic deviations.

From Eq.$(2)$ : $$\ln(V)=\ln(\theta_0)+\frac{1}{\theta_1}\ln\left(1-\frac{1}{A}\frac{1}{\left(1-\frac{\theta_2}{A}\right)} \right) $$ If $|A|$ is large, the logarithmic term can be expanded into an asymptotic series : $$\ln\left(1-\frac{1}{A}\frac{1}{\left(1-\frac{\theta_2}{A}\right)} \right) \simeq -\frac{1}{A} -(\theta_2+\frac12)\frac{1}{A^2}+O\left(\frac{1}{A^3} \right)$$ $$\ln(V)\simeq\ln(\theta_0) -\frac{1}{\theta_1}\frac{1}{A} -\frac{1}{\theta_1}(\theta_2+\frac12)\frac{1}{A^2} $$ This equation is on the form : $$Y=a+bX+cX^2 \quad\text{with}\quad \begin{cases} Y=\ln(V) \\ X=\frac{1}{A} \\ a=\ln(\theta_0) \\ b=-\frac{1}{\theta_1} \\ c=-\frac{1}{\theta_1}(\theta_2+\frac12) \end{cases}$$ PROCESS :

First, transform the data $(A_k,V_k)$ into $(X_k,Y_k)$ with $\begin{cases}X_k=\frac{1}{A_k} \\ Y_k=\ln(V_k) \end{cases}$ .

Second, compute $a,b,c$ thanks to a linear regression according to the equation $Y=a+bX+cX^2$ .

The result is : $\quad\begin{cases} \theta_0\simeq e^a \\ \theta_1\simeq -\frac{1}{b} \\ \theta_2\simeq \frac{c}{b}-\frac12 \end{cases}$

EXAMPLE of numerical calculus :

The data comes from a graphical scan of the figure published by Ben in Selection of data range changes coefficients too much in lmer (inverse regression) So, they are less accurate than if they were published on text form.

Screen copy :

The red crosses represent the data.

The blue line represents Eq.$(2)$ with the computed values of the parameters.

As it was expected, there is a residual deviation due to the approximation by a limited series.

if one want a more accurate approximate of the parameters, a non-linear regression is necessary. One can start the iterative process with the above values which are close to the correct vales. This ensures that the process will be more reliable.

ADDITION after comments

Cubic Polynomial Regression works very well with log-log variables :

Even quadratic polynomial regression is sufficient as shown on the next figure :

But the polynomial regression doesn't gives the approximates of $\theta_0,\theta_1,\theta_2$.

Thank you very much, I'm quite impressed! It's interesting to see how to solve such the question. I don't want to lessen that but finally I'm applying a polynomial of a third degree at a double logarithmical scale. Does this strongly change the way to go ? As I said, I really appreciate your answer and it will help me to understand how to pick up such a topic. Just need some time to understand :) — Ben, Jan 23 '19 at 07:12
I cannot understand why your "polynomial of a third degree at a double logarithmical scale" fails. For me it works very well. See the ADDITION to my above answer. — JJacquelin, Jan 23 '19 at 09:36
Ah, didn't notice it. Thank you! It is also working for me but only when the data range is limited. But this is sufficient. Can you tell me how I can calculate any possible deviation/error? Is that possible in general? I mean it would require the information of the real values but maybe the errors can be guessed? — Ben, Jan 23 '19 at 09:41
Not directly for $(A,V)$ but for $(\ln(A),\ln(V))$. See https://en.wikipedia.org/wiki/Polynomial_regression — JJacquelin, Jan 23 '19 at 09:53
I don't think you have framed the statistical model correctly, **and therefore have obtained an invalid answer,** because "the voltage is given" means $V$ has no error: the error is in $A.$ That's not a regression model. The standard solutions perform (possibly) nonlinear regression of $A$ against $V$ and base estimates on those results. — whuber, Jan 23 '19 at 14:38
@whuber. I don't think that you have well understood the goal of my main answer : the part concerning the calculus of approximates for $\theta_0,\theta_0,\theta_0$. The goal is to get enough accurate values in order, if necessary, to start the iterative process instead of starting from guessed values. This is pointed out in my answer. Of course if the result without non-linear regression is already sufficient no need for further calculus. If not a non-linear regression is necessary, starting from the approximates already obtained, the iterative non-linear regression is less subject to fail. — JJacquelin, Jan 23 '19 at 14:59
By failing to incorporate an error term, you obtain the wrong model altogether. That makes it difficult to justify your approach. If I'm wrong, you can convince me of that by showing how your method can produce a valid confidence region for the parameter estimates. — whuber, Jan 23 '19 at 15:03
The question of confidence region comes with the non-linear regression, not before. Don't confuse the preliminary calculus leading to good initial values of the parameters ( to start the iterative process if the OP want something better ) with the non-linear regression itself. I am afraid that we are in a dialogue of the deaf' . Doesn't matter, if my contribution is helpful for the OP. On theoretical viewpoint I agree to your remark : the non-linear regression must be carried out with the right model. On the practical view point, I change nothing in my first answer. — JJacquelin, Jan 23 '19 at 15:30

What is the error of my regression?

1 Answers1