0

I think the best way to ask this question is with a hypothetical situation. Let's say I have a sensor that measures on average one occurrence every 3 days with a standard deviation of about 4. I want to move that sensor to a new unknown location. How many days can I leave the sensor in this new location until I'm confident that this location will not produce an occurrence? Like 6,7,8 days...?

histo

In this plot I have included, the data is 93 days where integer occurrences are recorded. The average occurrence per day is 2.95. The std is 4.03. 0 is by far the most common value. In the data (which you cant see in the histogram) the longest stretch of 0 entries was 6 days. So if I move the sensor to some unknown location, how many days can it sit there reporting 0's until I'm confident that this unknown location doesn't experience occurrences

Krits
  • 41
  • 3
  • Given that this is a discrete distribution, having only values of 0 and 1, it makes more sense to talk about the probability of an event occurring on a day. You may be looking for the binomial distribution. – user2974951 Jan 11 '22 at 06:32
  • so it averages one occurrence per three days. And an occurrence is an integer. Say one day there were three occurrences. Then the next two days there was 0. I can add a plot from my data if that helps – Krits Jan 11 '22 at 06:50
  • I more or less just wanted the approach to the problem, but the two stds match now – Krits Jan 11 '22 at 08:24
  • But now the two parts of your question are fundamentally different. The first part is about waiting times between events and the second part is about events per day. – BruceET Jan 11 '22 at 18:43

1 Answers1

1

You seem to need to change the mean and standard deviation according to circumstances. So, it may be best to model waiting times between events as gamma distributed and round up to integers on account of daily data collection. This matches the story in the first part of your question.

(Notice that this is different from looking at the number of events per day, as in the last part with the histogram.)

Specifically, consider $\mathsf{Gamma}(\mathrm{shape}=9/16, \mathrm{rate} = 3/16),$ which has $\mu = 3, \sigma=4.$

Here is a summary of $1000$ such waiting times simulated in R:

set.seed(2022)
x = ceiling(rgamma(1000, 9/16, 3/16))
mean(x);  sd(x)
[1] 3.627
[1] 3.809512

This gamma distribution has probability about $0.1232$ of a wait exceeding 7 days between events, and probability $0.002$ of three consecutive waits exceeding 7 days. You could choose some such criterion to decide if events at a new site are too sparse to be of interest.

1-pgamma(7, 9/16, 3/16)
[1] 0.1231928
(1-pgamma(7, 9/16, 3/16))^3
[1] 0.00186963

Here is a histogram of my simulated sample of $1000$ waiting times.

hist(x, br=(0:31)-.5, col="skyblue2")
abline(v=7.5, col="red")

enter image description here

Waits of $7$ or more among the first $100$ waiting times (total of $341$ days) are designated by TRUEs below:

y = x[1:100]
sum(y)
[1] 341

y >= 7
  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [11] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [21] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [31] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [41] FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [51] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [71] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
BruceET
  • 47,896
  • 2
  • 28
  • 76
  • this is awesome thank you. I'm not familiar with the gamma dist or how to tweak it. Can you make a gamma dist with just mean and std? If I have a gamma dist that roughly models my data, I can then just subtract the probability from 1? – Krits Jan 11 '22 at 18:52
  • 1
    See [Wikipedia](https://en.wikipedia.org/wiki/Gamma_distribution) on gamma distributions (shape-rate parameterization). Alternatively, an intermediate level probability text or the probability section of a math stat text. – BruceET Jan 11 '22 at 18:57
  • Do you know the 'shape' parameter of your gamma distribution, and either its rate or scale parameter? See Wikipedia for how to get from shape/rate to mean and standard deviation. (A bit if algebra is required to go from mean and variance to shape and rate; R uses rate.) I picked shape $9/16$ and rate $3/16$ precisely because they give $\mu=3,\sigma=4.$ – BruceET Jan 11 '22 at 19:15
  • 1
    I wasn't able to figure out the gamma dist parameters from my data. Does this approach seem valid for a back-of-the-envelope approach? From my data, the chance I see a 0 is 44% at the original site. So after 7 days, my chance of seeing 7 consecutive 0s is (.44^8) < .01. So if I have seen a 0 in 7 consecutive days at a new site, then that's approx a .001 chance – Krits Jan 11 '22 at 21:33