How do I check if my data fits an exponential distribution?

Question

How could I check if my data e.g. salary is from a continuous exponential distribution in R?

Here is histogram of my sample:

enter image description here

. Any help will be greatly appreciated!

is your variable discrete or continuous? [Exponential distribution is defined as continuous](http://en.wikipedia.org/wiki/Exponential_distribution). — Tomas, Nov 19 '13 at 13:12
Welcome. Look for the function `fitdistr` in R. It adjusts probability density functions (pdfs) based on maximum likelihood estimation (MLE) method. Also search in this site terms as pdf, fitdistr, mle and similar questions will come up. Bare in mind that questions such like that almost requires [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to gather good answers. Also, it helps if the question is not purely about programming (which might lead it to be put on hold as off-topic). — Andre Silva, Nov 19 '13 at 13:25
An exponential distribution will plot as a straight line against $-\ln(1 - $plotting position) where plotting position is (rank $- a) / (n - 2a + 1)$, rank is $1$ for lowest value, $n$ is sample size, and popular choices for $a$ include $1/2$. That gives an informal test which can be as or more useful than any formal test. — Nick Cox, Nov 19 '13 at 13:42
Tests for the exponential distribution have been discussed in [question 32061](http://stats.stackexchange.com/questions/32061/what-are-the-standard-statistical-tests-to-see-if-data-follows-exponential-or-no). The 'best' test strongly depends on the alternative hypothesis: gamma, Weibull, Pareto, ... Log-normal and Pareto seem to be good choices for salaries. — Yves, Nov 20 '13 at 18:37

score 31 · Accepted Answer · edited Apr 04 '19 at 07:19

I would do it by first estimating the only distribution parameter rate using fitdistr. This won't tell you if the distribution fits or not, so you must then use goodness of fit test. For this, you can use ks.test:

require(vcd)
require(MASS)

# data generation
ex <- rexp(10000, rate = 1.85) # generate some exponential distribution
control <- abs(rnorm(10000)) # generate some other distribution

# estimate the parameters
fit1 <- fitdistr(ex, "exponential") 
fit2 <- fitdistr(control, "exponential")

# goodness of fit test
ks.test(ex, "pexp", fit1$estimate) # p-value > 0.05 -> distribution not refused
ks.test(control, "pexp", fit2$estimate) #  significant p-value -> distribution refused

# plot a graph
hist(ex, freq = FALSE, breaks = 100, xlim = c(0, quantile(ex, 0.99)))
curve(dexp(x, rate = fit1$estimate), from = 0, col = "red", add = TRUE)

enter image description here

From my personal experience (though I have never found it officially anywhere, please confirm or correct me), ks.test will only run if you supply the parameter estimate first. You cannot let it estimate the parameters automatically as e.g. goodfit does it. That's why you need this two step procedure with fitdistr.

For more info follow the excellent guide of Ricci: FITTING DISTRIBUTIONS WITH R.

+1. Is the ks.test the best test for testing goodness of fit in such situations? I think you also answered one question of [mine](http://stats.stackexchange.com/questions/68700/test-for-measuring-pdf-goodness-of-fit). — Andre Silva, Nov 19 '13 at 13:42
just a heads up, you should be careful if you're estimating the parameters from the data itself. In that case, a test like anderson darling might be best, as it is robust against such situations. — JPC, Jul 18 '15 at 01:09
On the KS test for exponentiality when the rate parameter is estimated from the data: see [A naive question about the Kolmogorov Smirnov test](https://stats.stackexchange.com/q/110272/17230). — Scortchi - Reinstate Monica, Mar 09 '18 at 19:27

Glen_b · Answer 2 · 2018-03-08T22:35:52.813

While I'd normally recommend checking exponentiality by use of diagnostic plots (such as Q-Q plots), I'll discuss tests, since people often want them:

As Tomas suggests, the Kolmogorov-Smirnov test is not suitable for testing exponentiality with an unspecified parameter.

However, if you adjust the tables for the parameter estimation, you get Lilliefors' test for the exponential distribution.

Lilliefors, H. (1969), "On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown", Journal of the American Statistical Association, Vol. 64 . pp. 387–389.

The use of this test is discussed in Conover's Practical Nonparametric Statistics.

However, in D'Agostino & Stephens' Goodness of Fit Techniques, they discuss a similar modification of the Anderson-Darling test (somewhat obliquely if I recall right, but I think all the required information on how to approach it for the exponential case is to be found in the book), and that's almost certain to have more power against interesting alternatives.

Similarly, one might estimate something like a Shapiro-Francia test (akin to but simpler than the Shapiro-Wilk), by basing a test on $n(1-r^2)$ where $r$ is the correlation between the order statistics and exponential scores (expected exponential order statistics). This corresponds to testing the correlation in the Q-Q plot.

Finally, one might take the smooth test approach, as in the book by Rayner & Best (Smooth Tests of Goodness of Fit, 1990 - though I believe there's a more recent one, with Thas and "in R" added to the title). The exponential case is also covered in:

J. C. W. Rayner and D. J. Best (1990), "Smooth Tests of Goodness of Fit: An Overview", International Statistical Review, Vol. 58, No. 1 (Apr., 1990), pp. 9-17

Cosma Shalizi also discusses smooth tests in one chapter of his Undergraduate Advanced Data Analysis lecture notes, or see Ch15 of his book Advanced Data Analysis from an Elementary Point of View.

For some of the above, you may need to simulate the distribution of the test statistic; for others tables are available (but in some of those cases, it may be easier to simulate anyway, or even more accurate to simulate yourself, as with the Lilliefors test, due to limited simulation size in the original).

Of all of those, I'd lean toward doing the one that's the exponential equivalent to the Shapiro-Francia (that is, I'd test the correlation in the Q-Q plot [or if I was making tables, perhaps use $n(1-r^2)$, which will reject the same cases] - it should be powerful enough to be competitive with the better tests, but is very easy to do, and has the pleasing correspondence to the visual appearance of the Q-Q plot (one could even choose to add the correlation and the p-value to the plot, if desired).

Zhubarb · Answer 3 · 2013-11-19T15:34:56.567

You can use a qq-plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other.

In R, there is no out-of-the-box qq-plot function for the exponential distribution specifically (at least among the base functions). However, you can use this:

qqexp <-  function(y, line=FALSE, ...) { 
    y <- y[!is.na(y)]
    n <- length(y)
    x <- qexp(c(1:n)/(n+1))
    m <- mean(y)
    if (any(range(y)<0)) stop("Data contains negative values")
    ylim <- c(0,max(y))
    qqplot(x, y, xlab="Exponential plotting position",ylim=ylim,ylab="Ordered sample", ...)
    if (line) abline(0,m,lty=2)
    invisible()
  }

While interpreting your results: If the two distributions being compared are similar, the points in the q-q plot will approximately lie on the line y = x. If the distributions are linearly related, the points in the q-q plot will approximately lie on a line, but not necessarily on the line y = x.

In Stata, `qexp` from SSC is a canned implementation. – Nick Cox Nov 19 '13 at 15:31 — Nick Cox, Nov 19 '13 at 15:31

How do I check if my data fits an exponential distribution?

3 Answers3

Linked