18

I was wondering how you would generate data from a Poisson regression equation in R? I'm kind of confused how to approach the problem.

So if I assume we have two predictors $X_1$ and $X_2$ which are distributed $N(0,1)$. And the intercept is 0 and both of the coefficients equal 1. Then my estimate is simply:

$$\log(Y) = 0+ 1\cdot X_1 + 1\cdot X_2$$

But once I have calculated log(Y) - how do I generate poisson counts based on that? What is the rate parameter for the Poisson distribution?

If anyone could write a brief R script that generates Poisson regression samples that would be awesome!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Michael
  • 2,180
  • 4
  • 23
  • 32

2 Answers2

29

The poisson regression model assumes a Poisson distribution for $Y$ and uses the $\log$ link function. So, for a single explanatory variable $x$, it is assumed that $Y \sim P(\mu)$ (so that $E(Y) = V(Y) = \mu$) and that $\log(\mu) = \beta_0 + \beta_1 x$. Generating data according to that model easily follows. Here is an example which you can adapt according to your own scenario.

>   #sample size
> n <- 10
>   #regression coefficients
> beta0 <- 1
> beta1 <- 0.2
>   #generate covariate values
> x <- runif(n=n, min=0, max=1.5)
>   #compute mu's
> mu <- exp(beta0 + beta1 * x)
>   #generate Y-values
> y <- rpois(n=n, lambda=mu)
>   #data set
> data <- data.frame(y=y, x=x)
> data
   y         x
1  4 1.2575652
2  3 0.9213477
3  3 0.8093336
4  4 0.6234518
5  4 0.8801471
6  8 1.2961688
7  2 0.1676094
8  2 1.1278965
9  1 1.1642033
10 4 0.2830910
ocram
  • 19,898
  • 5
  • 76
  • 77
3

If you wanted to generate a data set that fit the model perfectly you could do something like this in R:

# y <- exp(B0 + B1 * x1 + B2 * x2)

set.seed(1234)

B0 <-  1.2                # intercept
B1 <-  1.5                # slope for x1
B2 <- -0.5                # slope for x2

y <- rpois(100, 6.5)

x2 <- seq(-0.5, 0.5,,length(y))
x1 <- (log(y) - B0 - B2 * x2) / B1

my.model <- glm(y ~ x1 + x2, family = poisson(link = log))
summary(my.model)

Which returns:

Call:
glm(formula = y ~ x1 + x2, family = poisson(link = log))

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.581e-08  -1.490e-08   0.000e+00   0.000e+00   4.215e-08  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.20000    0.08386  14.309  < 2e-16 ***
x1           1.50000    0.16839   8.908  < 2e-16 ***
x2          -0.50000    0.14957  -3.343 0.000829 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 8.8619e+01  on 99  degrees of freedom
Residual deviance: 1.1102e-14  on 97  degrees of freedom
AIC: 362.47

Number of Fisher Scoring iterations: 3
Mark Miller
  • 639
  • 7
  • 12