If I have two variables x and y that have a linear relationship e.g. using data from the mtcars package and R code
df <- mtcars[c(4,7)]
names(df) <- c("x", "y")
model <- lm(y~x, data = df)
p <- ggplot(data = df, aes(x,y)) +
geom_point() +
stat_smooth(method = "lm")
if I've test that my model doesn't violate any of the assumptions of lm, for each value of x there exists some distribution of y with a mean on the best fit line for the linear regression.
e.g. if I have a some thing with an x = 100 then my expected value of y will be ~ 18.75, but there will be some pdf around this
Is the correct way to set the parameters to take the predicted mean given the lm, and the sd of the entire distribution of y such that:
my_object <- data.frame(x = 100)
sd <- sd(df$y)
Ey <- predict(model, new = my_object, se.fit = TRUE)$fit
gives: sd = 1.786943 Ey = 18.71
so if I now want to take multiple samples of y given x = 100 I can do
rnorm(1, Ey, sd)
taking the sd of the whole sample doesn't smell right but I can't think of any other way that you'd do it (if it is at all even possible)
[p.s. might be better suited to stackoverflow, but I assumed its more of a stats question]