Is it so that:
- $y_i$ is not a discrete value, but a range with probability density function
- Which means for the same predictor(s) value $y_i$ could have different results
- In linear regression this distribution can only be normal
- In GLM, this distribution can be any distribution from the exponential family
- distribution of a single $y_i$ has nothing to do with distribution of all $y(s)$
- $\mu_i$ is expected value of $y_i$
- In practical use, $\mu_i$ is the predicted value $y_i$, specially if dataset has only one y for given predictor(s)
Are above correct? Where am I wrong?
Based on the above I've tried simulating glm
with lm
in R, and it kinda works:
library(boot)
download.file("https://dl.dropbox.com/u/7710864/data/ravensData.rda",
destfile="./ravensData.rda",method="curl")
load("./ravensData.rda")
# download manually and loadhere if above fails
# load("/yourpath/ravensData.rda")
# calling logit(ravensData$ravenWinNum) results in
# [1] Inf Inf Inf Inf Inf -Inf Inf Inf Inf Inf -Inf Inf Inf Inf Inf -Inf
# [17] -Inf -Inf Inf -Inf
# that's way too much, as inv.logit goes to 1 at 20
# so we'll write our own dummy "logit" routine
# this will give us 5 when winNum=1 and -5 when it's zero
win <- ravensData$ravenWinNum*10-5
# now we can do a simple lm
fit <- lm(win~ravensData$ravenScore)
# and get probability of win using inv.logit
fitwin <- inv.logit(fit$fitted.values)
plot(ravensData$ravenScore, fitwin)
# now glm
fitglm <- glm(ravensData$ravenWinNum ~ ravensData$ravenScore, family="binomial")
plot(ravensData$ravenScore,fitglm$fitted)