1

Suppose I have count data grouped in equal time intervals as a dependent variable. Often a Poisson regression is a better suited GLM model then, say, conditional Gaussian.

Due to my little training in math, I fail to see how it is possible to say that conditional Poisson response in the model is valid if one of the Poisson assumptions, "events occure independently of the time since the last event", is not being explicitly checked. I suppose we say conditional time-wise indepence.

Do we satisfy this assumption implicitly, is there a solid ground for it?

Can't it be that Gaussian approximation to discrete variable (with mean and variance about the same) should be just another valid alternative?

Alexey Burnakov
  • 2,469
  • 11
  • 23

1 Answers1

1

What you are referring to is a homogeneous or stationary Poisson process. In that case, the distribution of the waiting time (the difference between the time of the next event and the current time, $T_{\text{next event}}- T_{\text{current}}$), is independent of the current time

$$P(T_{\text{next event}} \leq t| T_{\text{current}}) = 1 - e^{-\lambda (t- T_{\text{current}})}$$


But we can generalize this to a non-homogeneous Poisson process, and make the rate $\lambda$ a function of time instead of a constant. In that case it becomes:

$$P(T_{\text{next event}} \leq t| T_{\text{current}}) = 1 - e^{-\int_{T_{\text{current}}}^t\lambda(t) dt}$$

The property that remains for a non-homogeneous Poisson process is that it has independent increments. The number of counts in a particular interval is independent of the number of counts in any other interval.


For example.

A Geiger counter is measuring radioactive particles.

  • When we have the counter at a constant distance from the source then the waiting time will every time remain the same.
  • But when we are moving the counter closer to its source then the waiting time will decrease.

We loose the property that the waiting time is independent of time. Closer to the source we should expect to observe particles with a faster rate and the waiting time should decrease. But what remains is that the counts in this process are independent. For the probabilities of the number of counts that we will observe in a particular interval it doesn't matter how many counts we observed previously.


We can see $\lambda(t)$ as the probability density of an event between $t$ and $t+dt$ (see also here) and independent from any arrivals at other times.


In a very rough way, when we talk about a generalized linear model, we can do away with all this stuff about a Poisson model and just define the conditional probability:

$$f(y\vert X,\beta) = \text{Poisson}(g^{-1}(X\beta))$$

that is, the distribution of $y$, conditional on regressors $X$ (which could be for instance a function of time) and parameters $\beta$ is a Poisson distribution with rate parameter $\lambda = g^{-1}(X\beta)$.

The definition of the conditional distribution $f(y\vert X)$ doesn't care about the 'Poisson assumptions'.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • Thank you. Yes, I am familiar with the exponential distribution of waiting times, and it is also clear what independent increments mean. Reading the last part of your answer, do you say that even in the absense of the "property that the waiting time is independent of time" (which we don't measure anyway) we can still rely on constant even rate and independence of counts in neighbour time intervals? And still we know it is conditionally a Poisson? – Alexey Burnakov May 22 '20 at 11:41
  • 1
    What we need is that $\lambda(t) dt$, the probability of an event between $t$ and $t+dt$ is independent of other events, at other times. Then the number of events between time $t_1$ and $t_2$ is Poisson distributed, which you can derive with the link https://stats.stackexchange.com/questions/354552/what-distribution-to-use-to-model-time-before-a-train-arrives/354574#354574 – Sextus Empiricus May 22 '20 at 11:42
  • Thank you for a very detailed answer. So, plainly speaking, if $$ \frac{\Delta p(event|t)}{\Delta t}=0 $$ (probability of event for time period $t$ is contant) and $$ p(event|t1) * p(event|t2) = p(event|t1\bigcap event|t2)$$ (probability of event is independent over periods $t$), and we have count data, then for sure this is a Poisson process? Did I get it right? – Alexey Burnakov May 22 '20 at 12:02
  • 1
    The probability of an event is not necessarily constant, so that difference equation should not equal zero. Imagine it as adding a sum of Poisson variables with different rates, that is another Poisson variable. Only now we do it with infinitely many Poisson variables with infinitetly small rates. https://en.wikipedia.org/wiki/Poisson_distribution#Sums_of_Poisson-distributed_random_variables – Sextus Empiricus May 22 '20 at 12:07
  • OK, got it. The rate can be non-constant, for example, a function of time. This is what I wanted to learn, thak you. It is clearer what kind of processes are Poisson, contrary to homogeneous case I was thinking about. – Alexey Burnakov May 22 '20 at 12:13
  • And I also should correct my second equation as well. You argued about an invependent counts increment, not independent counts per se. $$ p(events|t1 - events|t2) * p(events|t0 - events|t1) = p((events|t1 - events|t2) \bigcap (p(events|t0 - events|t1)) $$ Even if $p$ varies with clocktime, the increments of counts stay independent. – Alexey Burnakov May 22 '20 at 12:26