1

I am working on Kaggle's house pricing exercise and I cannot understand something. I watch and read articles on normality tests, and more specifically JB test, but I cannot understand why according to my understanding of that test I need to reject the null hypothesis (which is the normal distribution) and conclude it is a non-normal distribution when the distribution graph shows a very close result to a normal distribution?

Jarque-Bera test = 171.236, with p-value 6.55459e-038

So from that result, if I am correct, I reject null and conclude the data are not normally distributed. But then this is the distribution graph (n=1460):

enter image description here

PS. The Y var is log of price and the x is year. Could the problem be that year is not a continuous variable?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Georgi
  • 11
  • 2
  • 1
    Could you explain in what sense you believe this histogram is "close" to a Normal distribution? In effect, you are confronting a formal test with an unexplained intuition; by default, you should believe the test and use its results to modify your intuition. – whuber Jul 29 '18 at 15:57
  • I was just confused that according to the graph, it seems the data is normally distributed becasue it follows the Bell curve shape, so I became unsure whther I understand the test meaning correctly. However, reading your answer, can I conclude that the test will win over the graph and this is because the graph actually is not perfect Bell shape..? – Georgi Jul 29 '18 at 16:10
  • 1
    That's right. Unfortunately, the Jarque-Bera test is difficult to visualize with a histogram (because it is based on higher moments of the distribution, which no histogram explicitly shows). You *can* evaluate normality by comparing the bar *areas* to the *areas* predicted by the fitted Normal curve, using the principles of a Chi-squared test of goodness of fit. By looking closely, I see that the bar for the interval [-0.1,0.1] is too high while the one for [0.3,0.5] is too low. Given there are 1460 data points, you can estimate that these show significant deviations from normality. – whuber Jul 29 '18 at 16:16

0 Answers0