5

I have a dataset. Assume that y is the dependent variable and x is the independent variable. My goals for this analysis is mainly on the following hypothesis:

  1. Expecting x=0 to imply y=0
  2. Expecting a significant relationship between x and y

To achieve this, I am trying to determine the best transformation of x and y to fit the best linear model in R. So, the final model I got is $\sqrt y$ against ln(x). When I fit the model in R, I obtain the following for the coefficients:

  Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.319615   0.028743   11.12 2.93e-10 ***
x           0.150139   0.009959   15.08 9.76e-13 ***
---

Questions:

  1. I am trying to interpret the Intercept term. Since the p-value is much less than 5% significance level, can I say that the intercept is significantly different from 0? However, this model is undefined for x=0, hence I'm not sure if this interpretation is valid. I was thinking of will it be OK if I were to refit the linear model for smaller x. < Solved >

  2. To address the above question, the problem as seen from this model is that I can't test for hypothesis 1. Would be very thankful if anyone could provide some help.

user106113
  • 63
  • 1
  • 1
  • 5
  • What I meant was click the [ASK QUESTION](http://stats.stackexchange.com/questions/ask) at the top, & ask a totally new question, not edit this one. It will also help to say what x & y are (eg, blood pressure, stock prices, etc), why you need a model of them, & why the model should show y=0 when x=0. – gung - Reinstate Monica Sep 19 '14 at 02:30
  • If you can say -- what does $y$ represent? – Glen_b Sep 19 '14 at 09:17
  • @Glen_b y is a ratio of two numbers, each is a count on the number of objects. – user106113 Sep 19 '14 at 21:33
  • Question continues in a new question thread. http://stats.stackexchange.com/questions/116106/performing-a-linear-regression-on-small-dataset-and-trouble-with-modeling-small – user106113 Sep 19 '14 at 21:47
  • The denominator of $y$ can occasionally be smaller than the numerator? It seems to behave like there's an upper boundary near 1 (as you'd see with a count divided by a total count - like "proportion of people with brown eyes"), but at least one of the observations exceeds 1. Trying to understand why it's nearly limited to 1 but not quite. – Glen_b Sep 19 '14 at 23:19
  • @Glen_b it is one of the limitation during the data collection process, which is not done by me as the total count is reported by the person but the numerator is reported by the organization. Can just take it as 1. – user106113 Sep 20 '14 at 22:52
  • @user106113 It would be nice to be able to do so, since that would allow treating $y$ properly - as a binomial count out of a total. However, there's several issues to worry about. – Glen_b Sep 21 '14 at 00:24

1 Answers1

3

The intercept term does not refer to when x=0, since your x is actually ln(x). Instead, the intercept refers to when ln(x)=0, which occurs when the old x=1. At that point (in the new space), $\hat y$ (i.e., $\widehat{\sqrt{y}}$) differs significantly from 0.

It may help you to read this excellent CV thread: Interpretation of log transformed predictor.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thanks for the clarification, I totally missed it. If this is the case, how should I test whether at x=0, y is not significantly different from 0? – user106113 Sep 19 '14 at 00:11
  • Is the point of your analysis to determine if y=0 when x=0? If so, using transformations (eg ln & sqrt) may not be the way to go. Is x=0 within, or close to, the range of x's you have in your dataset? – gung - Reinstate Monica Sep 19 '14 at 00:13
  • It is one of the reality check of the model, that when x=0, y should be close to zero. This transformation seems to be the only one that satisfy the model assumptions on residuals. And for the second question, my x ranges from 0.2 to 200. – user106113 Sep 19 '14 at 00:17
  • What's wrong with the residuals? Are they non-normal? Is the variance not constant? Is there a curvilinear relationship that isn't being picked up? Bear in mind that when x=0, ln(x)=-infty. – gung - Reinstate Monica Sep 19 '14 at 00:22
  • When i try to fit sqrt(x) or even cube root(x), the residuals have a downward parabolic trend which implies that it is not independent. – user106113 Sep 19 '14 at 00:26
  • The shape of the residual distribution has nothing to do w/ independence. Why not just regress y on x? What's wrong w/ that? – gung - Reinstate Monica Sep 19 '14 at 00:28
  • Because there is an obvious curve, something that looks like y = sqrt(x) in the original model. Also, regress y on x will end up with non-normal residuals. – user106113 Sep 19 '14 at 00:33
  • 1
    Why not try regressing y on x & x^2? Non-normal residuals usually aren't that big a deal. How far from normal are they & how much data do you have? – gung - Reinstate Monica Sep 19 '14 at 00:58
  • when i perform a shapiro-wilk test on the residuals for y on x, i get a p value of 0.004. Besides, referring to the previous comment, doesn't the downward parabolic trend in the residuals vs fitted plot implies non-independence? – user106113 Sep 19 '14 at 01:11
  • No, nothing about the shape of the residual distribution has to do w/ independence. Re S-W test, you may want to read [this](http://stats.stackexchange.com/q/2492/7290). What is your N? – gung - Reinstate Monica Sep 19 '14 at 01:16
  • Ok, I am confused. What I learnt from uni is that the shape of the residual distribution related to the independence. If this is not true, how do we test independence? – user106113 Sep 19 '14 at 01:21
  • That would be best as a new question, not buried in comments. What is your N? – gung - Reinstate Monica Sep 19 '14 at 01:26
  • I'm not sure what do u mean by N. I got the following when i run the shapiro-wilk test in R. "W = 0.8622, p-value = 0.004572". Not sure if this is also one of the reason, my dataset has only 23 points. – user106113 Sep 19 '14 at 01:28
  • Hmmm, 23 points is pretty tough. It may be hard to determine if the value of y at x=0 is 0. Can you paste your data into your question above? – gung - Reinstate Monica Sep 19 '14 at 01:31
  • I have added the data in the question. Would be grateful for any form of opinions. – user106113 Sep 19 '14 at 01:51
  • Hmm, not really sure & I don't have a lot of time to explore right now. Why not ask as a new question? Say a little bit about what the data are & what your goals are, etc. then post the data, etc. – gung - Reinstate Monica Sep 19 '14 at 02:01
  • Sorry to take up your time. Thank you so much for the help – user106113 Sep 19 '14 at 02:08
  • No problem, sorry I wasn't more help. – gung - Reinstate Monica Sep 19 '14 at 02:08