3

I am very new to Chi square goodness of fit tests but have done a fair bit of research. Basically, I have the following 12 data points:

7392, 7656, 7241, 6164, 4984, 4664, 15262, 17053, 21814, 5094, 4581, 10500.

And I would like to test the assumption that a Lognormal($\mu$=9.11,$\sigma$=0.51) is a good fit to this data.

I have some output from the program "Igloo", which states that the Chi square statistic is 3.10. I am struggling to derive this number. I know there are a number of ways to define e.g. the size and location of the "bins", the expected frequencies, etc.

The other information from Igloo is that the number of bins chosen is 3. That's all! I am struggling to justify using 3 bins, and struggling to find the size that the 3 bins should be! If I can't replicate the 3.1, then a method which gets close to this answer would suffice.

Any help on this would be greatly appreciated. I appreciate it is a very vague question.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Delvesy
  • 417
  • 3
  • 13
  • The question seems clear to me (and it's a good one). The solution depends on how you arrived at the lognormal parameters. See http://stats.stackexchange.com/a/17148 for the details. – whuber Feb 21 '14 at 00:16
  • Thanks for the reply. The lognormal parameters are essentially found by setting the mean = to the mean of the data (not exactly but close enough) and assuming a Coefficient of Variation of 55%). The link does help, but does not justify using 3 bins, and doesn't give indication of the size of bins or how to calculate expected frequencies. Appreciate the help. – Delvesy Feb 21 '14 at 00:22
  • The link explains why your procedure is likely to be incorrect: the parameters were not found using Maximum Likelihood and the bins were almost surely determined by the data. This means the "degrees of freedom" used in the test will be wrong. Without reverse-engineering your software it will not be possible to determine precisely how it set its bin counts or cutpoints. – whuber Feb 21 '14 at 00:25
  • 1
    Assessing goodness of fit with 12 observations is pretty low-power with the best of tests. Using the chi-square to assess goodness of fit is *very* low-power. So ... (i) why test goodness of fit? (ii) why use chi-square? and (iii) where do those parameter values come from? If they're determined from the data, aren't you really just testing for lognormality? – Glen_b Feb 21 '14 at 00:52
  • what's Igloo? I tried searching but hit things that I really don't think can be what you mean – Glen_b Feb 21 '14 at 01:10
  • Thanks very much for the responses. I understand that neither is this the best test to use with minimal data, nor my estimated parameters valid. However, I still need to replicate the 3.1. Or if not, get somewhere close/suggest a method of obtaining a statistic. Can you help with this part? I.e. coming.up with a statistic of your own? Many thanks – Delvesy Feb 21 '14 at 07:17
  • 2
    If I needed to test for lognormality without pre-specified parameters, I'd take logs and run a Shapiro-Wilk test. Why do you need to replicate the 3.1? Why is that more important than other considerations (e.g. any concern about this being a pointless exercise)? – Glen_b Feb 22 '14 at 05:56

1 Answers1

1

A goodness of fit test with only 12 observations is borderline --- and would only be able to detect gross departures from the null. For testing the null of lognormality i would use some normality test, for instance the Shapiro-Wilk test, on the logarithm of the observations. With your data, using R I get

shapiro.test(log(lnd))

    Shapiro-Wilk normality test

data:  log(lnd)
W = 0.89033, p-value = 0.119 

For what its worth, one could also make a qqnorm plot:

qqplot against lognormal distribution

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    This is likely the right approach (depending on information that has not been supplied): but on its face, it doesn't address the question, which concerns assessing the fit of *one particular* distribution. The issue is whether that distribution was estimated from these data or in some other way. – whuber Jun 03 '21 at 12:40