2

I have a least square fitting like this:

fit = lsfit(log10(M), log10(RS), wt)

This function lists statistics and p-values for the coefficient considering the null hypothesis is zero but I want to change the null hypothesis of the coefficient from 0 to 0.5. So when I look at the t-statistics or F-statistics listed by this function, I need to know what is the p-value for the coefficient being different from 0.5?

Fred
  • 749
  • 2
  • 12
  • 32

2 Answers2

5

Your case is very simple - just subtract 0.5 from the coefficient of log10(M) and divide it by the standard error to get your t-statistic, and then calculate a p-value.

In other words:

fit = lsfit(log10(M), log10(RS), wt)
fsum = ls.print(fit)
# Grab SE and DF from ls.print, its output is a bit messy
se = fsum$coef.table[[1]][2,2]
    df = as.integer(fsum$summary[5])
# Our hypothesis
htest = fit$coefficients[2] - 0.5
tstat = htest / se
pval = 2 * pt(abs(tstat), df=df, lower.tail = FALSE)
pval

Is there any particular reason why you are using lsfit instead of lm? That would simplify things for the general case. With lm and car package's linearHypothesis

lmfit = lm(log10(RS)~log10(M))
library(car)
linearHypothesis(lmfit, "log10(M) = 0.5")

These types of post-estimation tests are called a variety of things including general linear hypothesis, linear contrasts, regression Wald tests, etc depending on the field, book, and software. The theory and how these are manually constructed from the intermediate steps of estimation can be found in most graduate introductory books as well as numerous pdfs of course notes via google.

Affine
  • 2,307
  • 19
  • 25
  • Thanks. The only reason that I am using `lsfit` is that this is a line in the source code of a function `rsFit` in `fArma` package and I am not sure if I can change it?! Can I? – Fred Apr 20 '15 at 21:35
  • 1
    @Fred Since this is a very specific application that doesn't have any need for alternate hypothesis that would be more difficult to obtain from the `lsfit` output, I would not change it. Looking at the code for `rsFit`, it's just passing through the results of `ls.print`, so all the information you need is still available after calling `rsFit` (coefficient, standard error of the coefficient), you'll just need to dig those out of the return value. – Affine Apr 20 '15 at 21:58
  • 1
    Now...whether a hypothesis test is appropriate for this application is another question. My hunch is that this doesn't meet the assumptions for the hypothesis test, and so results may be iffy, but I don't have the background to fully answer that. – Affine Apr 20 '15 at 22:03
4

For any linear hypothesis (i.e. determining if a coefficient in a linear model is different from a particular value), one method is to subtract from each $y$ value your desired hypothesis multiplied by the $x$ value. So for your example

fit<-lsfit(log10(M),log10(RS)-.5*log10(M), wt)

You still test a null hypothesis of 0, but you change where the 0 point is.

To figure out why this works, there are few ways to think about it. I'll explain two.

To me, the most intuitive explanation is the model comparison approach. Your model (ignoring the log transformations... or assuming they have already been done and saved as new variables) is: $$y=\beta_0+\beta_1x_1+\epsilon$$ To test the null hypothesis, $H_0: \beta_1=0$, you compare the full model to the following model: $$y=\beta_0+0*x_1+\epsilon$$ of equivalently $$y=\beta_0+\epsilon$$ The hypothesis you want to test is $H_1:\beta_1=.5$. To test this, you compare the full model above to the following model: $$y=\beta_0+.5x_1+\epsilon$$ or equivalently $$y-.5x_1=\beta_0+\epsilon$$ The difference between the sum of squares of these models is the your explained sum of squares. You error statistic is the residual of the full model. While I find this the most intuitive method of thought, it doesn't directly map onto your R function, because you are not explicitly creating two models and comparing them (though what R is doing is equivalent to doing that). You can do this by creating two lm() models and comparing them using anova(), but you seem to be restricted to lsfit()

You can also think of it in terms of calculating a customized coefficient. For the full model (the first model written) least squares regression (or ANOVA) will compute the value for coefficient $\beta_1$ that best fits the data. Let's say that is $0.3$. Now you want to test if $0.3$ is different from $0.5$, which is to say test $H_1: \beta_1=.5$. Let's adjust your model just a bit by adding .5 to the parameter (in lay terms, let's "spot" the coefficient .5 points). $$y=\beta_0+(\beta_{1,H1}+.5)x_1+\epsilon$$ I renamed $\beta_1$ as $\beta_{1,H1}$ to distinguish it from $\beta_1$ in your full model. If you were to figure out this model, the coefficients will still be chosen so that your estimates best fit your observed data. Since you already know that $\beta_1$ from your full model best fits the data, then $$(\beta_{1,H1}+.5)=\beta_1$$ Since $\beta_1=.3$ then $\beta_{1,H1}= -.2$. This makes sense because your estimated $\beta_1$ from the full model is 2 points less than your hypothesis. Your goal is to figure out if being 2 points less than your hypothesis is significant. To finish it off, you work through the math of the model again. $$y=\beta_0+(\beta_{1,H1}+.5)x_1+\epsilon$$ $$=\beta_0+\beta_{1,H1}x_1+.5x_1+\epsilon$$

$$=y-.5x_1=\beta_0+\beta_{1,H1}x_1+\epsilon$$ This is the model you give to lsfit() or a single lm() model. While it looks different from the model comparison approach, it is not. It's just that how you test the coefficients is slightly different (but get the exact same results).

le_andrew
  • 1,255
  • 8
  • 10
  • Can you help me about how to make this test a one tail (only positive) t-test in `lsfit` ? – Fred Apr 21 '15 at 18:12
  • 1
    Im not sure how you are getting your t-statistic and p value. `t.test()` can output one tailed tests directly. However, its a simple, logical procedure. Assuming the distribution of your test statistic is symmetrical (like the t-distribution), you just follow a simple two step procedure that you can do in your head. 1. Is the value in the predicted direction (i.e. I thought group 1 would be higher than group 2, and it is). 2. If yes, divide your p value by 2. If no, p>.5 and not significant. You can figure out the actual p value, but there's really no point if you know its greater than .5 – le_andrew Apr 21 '15 at 18:34
  • Thanks again for your help. Just to be clear, so you mean the maximum P-value for one-sided is half of the P-value two-sided t-test? – Fred Apr 21 '15 at 20:43
  • 1
    No, the *exact* p-value of a one-sided test is 1/2 the p-value for 2-sided test, not the maximum it could be (assuming your t-statistic is in the tail you predicted it would be in). I think the reason most programs don't make it easy to do 1 sided tests is because it is so easy to get the p value for a 1 sided test from a 2-sided test. – le_andrew Apr 21 '15 at 23:51