How to fit a data against a Laplace / Double Exponential Distribution (and check GoF)

Question

I am a PhD student.

I have a data set (waiting time in minutes between tweets) which looks almost symmetrically to the naked eye.

I've tried a couple of distribution fits to this data and the closest I can get is a cauchy distribution with a p value = 0.02. I'd like to see if the data is a reasonable fit to a Laplace / Double Exponential Distribution.

mydata = c(251, 178, 342, 252, 253, 213, 335, 273, 250, 325, 253, 252, 254, 252, 240, 248, 
       253, 250, 250, 247, 257, 259, 250, 254, 251, 250, 251, 254, 248, 265, 239, 260, 
       253, 311, 252, 311, 250, 249, 251, 212, 289, 243, 253, 252, 254, 249, 250,
       259, 268, 346, 312, 263, 287, 281, 334, 239, 218, 280,   5, 255, 251, 255, 
       266, 325, 248, 249, 250, 251, 171, 326, 195, 198, 281, 271, 265, 267, 250, 251, 
       278, 264, 252, 265, 250, 243, 267, 250, 252, 253, 244, 252, 259, 132, 275, 182,
       336, 250, 251, 253, 358, 252, 276, 281, 255, 252, 191, 277, 283, 193, 213, 
       268, 277, 250, 236, 241, 296, 242, 249, 251, 250, 262, 250, 219, 263, 267, 245, 
       254, 251, 251, 234, 259, 264, 261, 246, 254, 264, 276, 236, 245, 253, 222, 240,
       250, 250, 252, 239, 254, 250, 263, 267, 251, 255, 256, 252, 243, 257,
       251, 252, 252, 242, 229, 250, 265, 252, 237, 270, 212, 268, 290, 256, 239, 239, 
       263, 251, 248, 252, 249, 241, 268, 261, 254, 256, 258, 250, 251, 250, 259, 257, 197,
       282, 461, 257, 250, 250, 250, 251, 253, 253, 251, 250, 263, 247, 254, 
       251, 256, 250, 250, 177, 305, 275, 203, 260, 250, 251, 252, 239, 274, 167, 262, 
       251, 272, 251, 264, 250, 256, 226, 257, 270, 240, 239, 255)

Which gives an interesting histogram

hist(mydata, breaks = 25, freq =F)

I looked at a couple of prior posts on this subject and they appear to use the nls function Double exponential fit in R however the data that is used appears to have two variable to model.

I then looked at converting my data to a frequency table and trying the above solution:

nonlin <- function(t, a, b, c) { a * (exp(-(abs(t-c) / b))) }
nlsfit <- nls(Freq ~ nonlin(times, a, b, c), data=mydataframe, start=list(a=2, b=2, c=2.5))

However I get an error (singular gradient matrix at initial parameter estimates) as the starting parameters appear bad. Then i found another post Why is nls() giving me "singular gradient matrix at initial parameter estimates" errors? that uses some code to estimate the starting parameters.

c.0 <- min(mydataframe$times) * 0.5
model.0 <- lm(log(times - c.0) ~ Freq, data=mydataframe)
start <- list(a=exp(coef(model.0)[1]), b=coef(model.0)[2], c=c.0)
model <- nls(times ~ a * exp(b * Freq) + c, data = mydataframe, start = start)

I get another error running the last line

Error in nls(times ~ a * exp(b * Freq) + c, data = mydataframe, start =     start) : 
  step factor 0.000488281 reduced below 'minFactor' of 0.000976562

Any help folks can provide is much appreciated. Jonathan

The maximum likelihood estimator of the location parameter for the double exponential is the median ... — kjetil b halvorsen, May 25 '17 at 16:02
What are you going to do with the fitted distribution? *How* are you testing goodness of fit? (e.g. the p-value for the Cauchy) Why are you performing goodness of fit tests on your fitted distributions? No simple distributional form will actually be "correct" so any simple distribution you choose *should* be rejected if your sample is large enough. — Glen_b, May 26 '17 at 03:21
With a fitted distribution i can use the known properties of the distribution to make inferrences around expected duration between tweets. To test GoF formally i plan to use Anderson-Darling. I want to check there my hypothesis: Is a Laplace distribution a reasonable fit to model the waiting times between tweets. Yes I am aware the fit will not be perfect (i'm not expecting it to be so) I am also aware that the shape of the data may change with a larger sample — Jonathan Dunne, May 26 '17 at 09:42
An Anderson-Darling -- or any other test -- will tell you if you have a good fit. In large samples like you have it will reject perfectly reasonable approximations (while missing others that may matter more to whatever you're doing). I wouldn't say you have a good fit, but it may be be perfectly adequate for some purposes; Anderson-Darling will reject it easily though. — Glen_b, May 26 '17 at 10:18
I used a plaplace function to generate an AD GoF and the hypothesis was rejected as follows: AD = 7.6687, p-value = 0.0001638. Interestingly if i chop the data in half at it's median one side fits an exponential very well (p-value = 0.18) but that is another story entirely — Jonathan Dunne, May 29 '17 at 14:32

Glen_b · Accepted Answer · 2017-05-26T03:17:20.327

4

You're trying to fit a density function. The question you link to is trying to fit a curve to y vs x data. They're different exercises. Nonlinear least squares may be suitable for that other question but it's not suitable for your problem.

Let's take the double exponential as a given, with density:

$$f(x;\mu,\tau)=\frac{1}{2\,\tau} \exp \left(-\frac{|x-\mu|}\tau \right) \, ,\: \small{{-\infty<x<\infty,\, -\infty<\mu<\infty, \,\tau>0}}$$

Then a good estimator of the location parameter $\mu$ is the sample median, and a good estimator of the scale $\tau$ is the mean deviation from the median (specifically, these are maximum likelihood estimators).

This would give estimates on your sample of $\hat \mu=252$ and $\hat\tau\approx 17.8$:

m = median(tweettimes)
data.frame(m = m, t = mean(abs(tweettimes-m)) )
    m        t
1 252 17.79565

It seems to fit at least moderately well --

but we can see that the tails of the data are heavier than that of the model.

edited May 26 '17 at 03:17

answered May 26 '17 at 02:51

Glen_b

257,508
32
553
939

Thanks Glen, what line of code did you use to overlay the Laplace curve to the histogram did you create a density function for laplace in the form: `hist(tweettimes, breaks = 25, freq = F) curve(dlaplace(x, a = 252, b = 17.79565), col = "red", add = TRUE)` – Jonathan Dunne May 26 '17 at 09:46
Something quite close to that, yes. I wrote a function for dlaplace (... `function(x,m=0,t=1) exp(-abs(x-m)/t )/2/t` ...) -- I have a couple already in several packages but by the time I found one and loaded the package I could have written it twice over so I just wrote it. I also wrote a little qlaplace function for the quantile function to use in the second plot; the red line is the fitted Laplace. The x-axis is the theoretical quantiles (unfortunately I didn't use a good label on that axis) – Glen_b May 26 '17 at 09:59

How to fit a data against a Laplace / Double Exponential Distribution (and check GoF)

1 Answers1

Linked