1

I was developing a set of simulated data of the following properties.

def <- defData(def, varname = "GT", dist = "poisson", formula = "0.14 * BB", link = "log")

I was expecting if my BB value if 50.214891, the GT will be e^(0.14*50.214891)=1096.63 but the outcome is shown as 1081. I am wondering what am I missing

enter image description here

MSilvy
  • 139
  • 1
  • 8
  • 4
    No Poisson random variable can have the value of 1096.93: it's not a whole number. That's the Poisson *parameter* for a random variable. The variable, being random, will acquire a random value! – whuber May 19 '21 at 17:39
  • 1
    Other questions about (fractional) expected values for discrete random variables represented with integers have been asked and answered here. [For example](https://stats.stackexchange.com/questions/223616/the-meaning-of-expected-value-for-discrete-random-variable-in-dice-experiments). Remember that expectations are *over the long run*. – Alexis May 19 '21 at 17:58

1 Answers1

4

Poisson observations must be non-negative integers, so you should not ever observe a value like $1096.63$.

That $1096.63$ is the parameter of the conditional Poisson distribution predicted by your regression equation. Then you draw a value from that $\text{Poisson}(1096.63)$ distribution, which happens to be $1081$.

This goes into something similar for a logistic regression. The logistic regression predicts a probability (the parameter of the binomial distribution, analogous to the $1096.63$ parameter of your Poisson), but the observations are categorical: either $0$ or $1$ (analogous to your observed/simulated $1081$). For you, the Poisson link is the natural logarithm, so the inverse link is the $\exp$ function. Adapted to Poisson...

set.seed(2021)
x1 = rnorm(1000)           # some continuous variables 
x2 = rnorm(1000)
z = 1 + 2*x1 + 3*x2        # linear combination with a bias (so your model)
lambda = exp(z)            # pass through an inverse link function
y = rpois(1000,lambda)     # Poisson response variable
Dave
  • 28,473
  • 4
  • 52
  • 104
  • 1
    Expectation is just an integral or a sum, not a value we should expect to observe or even is possible to observe. This is how we can make comments about an average family having $2.5$ kids without suggesting that someone gets cut in half. // I have removed the confusing terminology of saying to "expect" a value. – Dave May 19 '21 at 17:57
  • (FWIW, the OP's intercept appears to be 0.) – gung - Reinstate Monica May 19 '21 at 17:57
  • +1 Dave, [relevant point to your comment about 0.58ths of a boy](https://www.you-books.com/storebooks/N/N-Juster/The-Phantom-Tollbooth/_81.jpg) from [*The Phantom Tollbooth*](https://www.you-books.com/book/N-Juster/The-Phantom-Tollbooth). :) – Alexis May 19 '21 at 18:05
  • I actually understand what is happening there. It is clear now. I have another question: If I want to write what I did with the simulated data, can I write like this in the report? Exp(GT)=e^(0.14 * BB) – MSilvy May 19 '21 at 18:44
  • @MSilvy That is the expectation of the conditional Poisson distribution. Remember that a GLM is written as $g(\mathbb{E}[Y\vert X]) = X\beta$, so $\mathbb{E}[Y\vert X] = g^{-1}(X\beta)$, where $g$ is the link function. For Poisson regression, the link function is the natural logarithm, so $g^{-1}$ is the exponential function. – Dave May 19 '21 at 18:50
  • Thanks a lot Dave. So this is right? E[GT|BB]=e (0.14*BB) – MSilvy May 19 '21 at 19:19
  • @MSilvy It is hard to tell, but I think `BB` is the conditional expected value, and `GT` is the observation drawn from that conditional Poisson distribution (`rpois`). It does not seem like you have a predictor in the data frame you posted. – Dave May 19 '21 at 19:22
  • I am sorry but I am confused again. Please pardon my continuous questions. Does it mean the GT is being generated randomly without considering the 0.14*BB formula I specified?. I generated BB using Uniform distribution, specifying a maximum and minimum value. What would the right way to represent the equation? – MSilvy May 19 '21 at 19:44
  • @MSilvy If you don't have a predictor variable, then you don't have a regression. But maybe you don't want a regression; you just want to draw a value from some Poisson distributions. I think that's what you're doing. In that case, what I wrote about the GLM and the discussion about the logistic regression might be useful in getting you to understand some ideas (particularly when you get to GLMs), but it is not quite what you're doing. – Dave May 19 '21 at 19:54
  • From a coding explanation point of view, the formula is not doing anything? def – MSilvy May 19 '21 at 20:05
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/124448/discussion-between-dave-and-msilvy). – Dave May 19 '21 at 20:08