Least Squares Fits to Experimental Data

Question

My attempt at making sense of the problem:
The problem provides us with the sum of squares (SS; I believe that's the $18.1$). We can use the SS along with the number of samples $(21)$ to get the standard error. Then somehow use this information to find the probability of having a worse fit?
After more research I believe the value given as $18.1$ is mean square deviation or sample variance and not the sum of squares. If this is true, then an F-test certainly makes sense for part b.

Could someone guide me in the right direction and show me how to attempt this question?
Any help would be appreciated.

Since this is for the purposes of study, please add the [self-study tag](http://stats.stackexchange.com/tags/self-study/info) and read its tag-wiki info (at the link, though you may not need to change your question very much, since you're already doing some of the things it asks for). — Glen_b, Apr 30 '14 at 23:06
(a) is ambiguous/misleadingly phrased at best, since what it (I hope!) really seeks is a conditional probability, not an unconditional one. What would the distribution of $\sum d_i^2/\sigma_i^2$ be *if the given model were the correct one*? (b) you might be able to cast it in the form of an F-test, perhaps, though given that the $\sigma_i$ are all known you may actually be able to deal with it in terms of a chi-square test. — Glen_b, Apr 30 '14 at 23:13
No need to apologize; I can see you're new, you won't know until someone tells you. There's also a little discussion of homework questions (in the broadest interpretation of the word homework) in the [help on asking questions](http://stats.stackexchange.com/help/on-topic) (near the middle of the page), which you might take a look at when convenient. Oh, and welcome to the site. — Glen_b, Apr 30 '14 at 23:17
@Glen_b Thanks for the help Glen. I still don't quite get it. For part a are you asking what the distribution of the sample variance would be if the model given (quadratic) were the correct one? If so, would it be normally distributed? For part I believe we were advised to use the chi-sqaure test but I am not sure how to derive all of the σi from the data given. — , Apr 30 '14 at 23:29
$\sum d_i^2/\sigma_i^2$ isn't quite a variance; that $\sigma_i$ makes it something else, and a variance would be scaled for sample size. No, that's not normal. Simpler question: what would be the assumed distribution of $d_i/\sigma_i$? On the other thing - presumably the $\sigma_i$ come from the observational error calculations. — Glen_b, Apr 30 '14 at 23:33
@Glen_b If i am understanding correctly and deviation is the difference between the value observed and mean of the sample then the assumed distribution of di/σi would be a straight line passing thorugh 0 (in the middle) — , Apr 30 '14 at 23:57
Um... no, I mean the probability distribution of the random variables that the observed $d_i/\sigma_i$ would represent observations on. Sorry, I have not been sufficiently precise, and so less than clear. I'm trying to get you (hopefully) to see what the distribution of $\sum d_i^2/\sigma_i^2$ could be (at least approximately) by focusing on a single component. — Glen_b, May 01 '14 at 01:02
@Glen_b Well if the quadratic fit was correct wouldn't the distribution of di/σi be parabolic? Sorry I still don't know if I am understanding the question. — , May 01 '14 at 01:58
Given the model, the distribution of the error term is not related to the model. The model describes the relationship between y and one or more x's. The distribution of the $d_i$s is not related to $x$ in the model; it's the part that's orthogonal to the x-space. Do you know what a probability distribution is? — Glen_b, May 01 '14 at 02:01
I do know what a probability distribution is but usually use mean and standard deviation to figure it out. I am not sure how to apply that to this problem — , May 01 '14 at 02:21
Mean and standard deviation tell you about location and scale, but not the distributional shape. To work out the probability of a value of $\sum d_i^2/\sigma_i^2$ at least as large as the one observed (the question you posted), you need the distribution of $\sum d_i^2/\sigma_i^2$ when the model is correct. To work out (or at least motivate the answer for) the distribution of $\sum d_i^2/\sigma_i^2$, it would help if you knew something about the distribution of $d_i/\sigma_i$ in teh same circumstances — Glen_b, May 01 '14 at 02:39
I cannot think of a way to find the shape of the distribution given no data. All i can think is that if the model is correct, then the deviations would be small. How would I go about finding the distribution of di/σi with just the given information? — , May 01 '14 at 03:07
Via assumptions (checkable ones, generally). You mentioned an F test. That (based on the statistic having an F distribution) is not true just for *any* distribution on the $d_i$'s ... so where does it come from? Encouraged by the implied assumption, and the form of the statistic in the question, I mentioned a chi-square. If you don't know how the F arises, or the chi-square, it at least tells me what I need to teach you; if you do know, it tells me what I don't need to teach you. — Glen_b, May 01 '14 at 03:59
... ctd The question you were given implies you have some particular pieces of knowledge, and your discussion in the question suggested more. As we've gone on it sounds like you don't know all you need to know (which makes me wonder why the person setting the question would assume that you *do* know it - they clearly think this is part of what you've already covered at some point -- but why do they think so?). It sounds like you have some bits and pieces of what's required, but its hard now to judge what you *do* know. — Glen_b, May 01 '14 at 04:05
we have never done anything like this in class but we have learned the chi-square test and mentioned the F test(not in detail). Most of the questions on the guide were pretty much hand in hand with what was covered in lecture except for this one. I was looking at the F test particularly because the random variable used in the distribution is similar to the information given by the problem. With that said you are correct I am confused by this problem hence why I need help. — , May 01 '14 at 04:52
Specifically, the derivation of a chi-square relies on the assumption that $D_i\sim N(0,\sigma_i^2)$, that is that $D_i/\sigma_i$ is standard normal (here $D_i$ is the random variable, and $d_i$ its observed value). A sum of squares of independent standard normals would be $\chi^2$, but you would lose a degree of freedom for each of the estimated parameters. F distributions arise when you take ratios of reduced chi-squares times some unknown $\sigma^2$ (you'd use that if you don't know the standard deviation of $D$'s, only their relative sizes, because taking the ratio cancels out the unknown) — Glen_b, May 01 '14 at 04:58
If that's any help I can write it up as an answer, but I still don't know what you don't know. — Glen_b, May 01 '14 at 05:01
I believe I understand what you are saying. Write it up as an answer so I could give you points for helping me. I will spend some time reviewing the answer and will ask if I dont understand anything in particular. — , May 01 '14 at 13:04

score 1 · Accepted Answer · answered May 01 '14 at 15:52

As mentioned in comments, I believe the intent with (a) is that you do a chi-square test.

Let $D_i$ be the random variable, and $d_i$ its observed value

The derivation of a chi-square relies on the assumption that $D_i∼N(0,σ^2_i)$, that is that $D_i/σ_i$ is standard normal. A sum of squares of independent standard normals would be $χ^2$, but you would lose a degree of freedom for each of the estimated parameters.

For part (b), $F$ distributions arise when you take ratios of reduced chi-squares times some common but unknown $σ^2$. You'd use that if you don't know the standard deviation of $D$'s, only the relative sizes of them (i.e. if $σ_i=c_i\sigma$ for known $c_i$). This would be used because taking the ratio cancels out the unknown $σ^2$.

I think in your case, the $\sigma_i$ are completely known, in which case I believe you work similarly to an F-test (as you said), except you know the variance of the residual should be 1 - the difference in sum of squares of scaled residuals (SSE=$\sum_i d_i^2/\sigma_i^2$) between the two models is taken, ($\text{SSE}_\text{linear}-\text{SSE}_\text{quadratic}$) which should have a $\chi^2_1$ distribution.

(You could do an $F$ test, still, however, and if you weren't confident that $d_i/\sigma_i$ had a variance of 1, you probably should. But then the answer for (a) may have a problem.)

Well, no, that's not right at all. (i) Note that a p-value is a conditional probability (see the first sentence [here](http://en.wikipedia.org/wiki/P-value)), and (ii) You should take care [not to over-interpret a p-value](http://stats.stackexchange.com/questions/94974/is-the-exact-value-of-a-p-value-meaningless). In particular, since you just tried to compare two p-values, check [this out](http://www.stat.columbia.edu/~gelman/research/published/signif4.pdf) — Glen_b, May 07 '14 at 02:27

Least Squares Fits to Experimental Data

1 Answers1