0

When conducting a GLM, why do I have to square my independent value in order to model an unimodal distribution?

for example (Matlab code)

%GLM
[logit_rain,dev,stats] = glmfit([(rain) (rain).^2
],[dependend_variable],'binomial','logit'); 
Pat
  • 3
  • 1
  • It is not necessary to square any independent values. To clarify what you are doing, it would help to express your code in a conventional manner, such as mathematical notation, so that you can have access to the thoughts of knowledgeable people who are not conversant in Matlab. – whuber May 10 '14 at 17:48
  • As phrased, this question doesn't make sense to me. Perhaps context - such as explaining why you think what you state as fact in your question is the case - may help. – Glen_b May 11 '14 at 01:49
  • @whuber : I am doing a GLM where rainfall is my independent variable and plant growth my dependend variable. When plotting them against each other it seems that an unimodal distribution is present. I got then told that I have to square the independent variable to create a fitted line but I dont know why.. – Pat May 11 '14 at 07:53
  • Maybe [my analysis of plant growth data](http://stats.stackexchange.com/questions/63978/do-statisticians-assume-one-cant-over-water-a-plant-or-am-i-just-using-the-wro/64039#64039) will shed some light on your question. – whuber May 12 '14 at 13:07

1 Answers1

0

This is a logistic regression model. You speak of a unimodal distribution, but I don't know what you mean by that in this context.

A linear model would have the probability of plant growth increase as rainfall increases or decreases, but not both. If you want the probability to reach a maximum at a certain rainfall level and decrease as you move away from that level, you would need some ability to specify a function with that type of behavior. One way to do that would be to include a squared term.

Tom Lane
  • 814
  • 5
  • 3
  • ok that makes sense to me. The plant growth increases until a certain precipitation level is reached and decreases after. The logistic model is used because the locations are distinguished between no plant growth at all and plant growth. Depending on the level of precipitation, the plant grows or grows not, following an unimodal distribution. What I still dont get is why do I have to include both terms then, the linear and the squared one? ( (rain) (rain)^2 ) – Pat May 12 '14 at 06:20
  • 1
    If this were just least squares regression and you wanted a peak at x=10 of height 5, you might write y=-5*(x-10)^2. Multiply that out and you'll see constant, linear, and squared terms. Same idea here for logistic regression. You want to give the estimation process flexibility to choose the location and height of the maximum, as well as the rate of drop-off as you move away. – Tom Lane May 12 '14 at 17:27