0

I want to perform a chi-square test on some data. To generate the histogram, I can use python's y,x = np.histogram(data).

This gives me the height of the histogram, y, and the bin edges x (I can find the bin centres easily).

I have the choice of normalizing the histogram. My question is: If I plan to perform a chi-square test, should I normalize my data or should I use the original frequencies?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
  • 1
    *Which* chi-square test do you wish to apply? What specifically is it testing? If it's testing whether the data are compatible with a particular distributional assumption, then you need to watch out for many pitfalls. In particular, by letting Python select the histogram bins for you based on the data you have already violated the underlying statistical assumptions of any chi-square test. – whuber May 28 '17 at 17:07
  • @whuber The test is for the data are compatible with a particular distribution. How would you recommend I use python to perform the test? – Demetri Pananos May 28 '17 at 17:27
  • What computational software you use is the very least of the issues you have to deal with: the calculations are simple and can be handled even with a spreadsheet. See the middle of my answer at https://stats.stackexchange.com/a/17148/919 for an account of the assumptions this test makes. – whuber May 28 '17 at 19:40
  • 1
    If you scale or shift bin-counts you'll change the distributional properties and the statistic won't have the right distribution – Glen_b May 29 '17 at 23:19

0 Answers0