1

I have a large sets of real-world user data (30k, 80k, 90k measurements). To be precise those are simply session lengths for a specific system. I want to create a theoretical model of this, to generate session lengths that roughly follow the distribution of the real-world session lengths (to be used in simulations).

I fitted the data to a Weibull distribution, which visually worked very well. When I create sample data from the Weibull distribution, using the parameters from the fitting, I get data that is very, very close.

However, when I want to test the goodness of the fit, things don't look so good. At first I used the K-S test (first value is D, second value is p-value)

K-S test            = (0.044257085422165915, 6.1787818394288534e-160)
2-sample K-S test   = (0.044934832227649907, 9.8401466055748469e-83)

The D-values are pretty low, which is nice. But the p-values are abysmal. Further research lead me to this answer, which - if I understand correctly - states, that a K-S test might not be the best tool for my case. The thing is, that the real-world data simply does not follow a specific, theoretical distribution. So the p-values should be low. What I want is just a distribution that generates values that are pretty close to the real world data.

Is there any renowned test that supports me in finding a distribution that is pretty close?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
senft
  • 11
  • 2
  • 1
    The answer you link to suggests estimating a correlation (between sample & theoretical quantiles) rather than performing a test - I'm not quite sure how it doesn't already answer your question. Indeed, aren't the K-S test statistic values you've calculated already telling you in one sense how close the empirical distribution is to the theoretical? – Scortchi - Reinstate Monica May 05 '15 at 16:46

1 Answers1

0

You would<most often be better off using a Weibull plot for assessing distribution fit, see Weibull plot to assess goodness of fit.

If you rather want/need a formal goodness of fit test, see A goodness of fit test for the Weibull distribution, but the advice at Is normality testing 'essentially useless'? would equally apply for Weibull testing, as for normal testing.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467