Determining if data is normally distributed

Question

I'm working on a case study where the number of phone calls (min = 0 calls and max = 10 calls) is measured every 30 seconds in a metro station. The experiment was done three times during on day where the number of phone calls were measured every 30 seconds for one hour. Hence, the data that has been collected has three experiments. Each experiment therefore has 120 measurements of phone calls ranging from 0 to 10. The data in my excel sheet has three columns (one for each experiment) and each column has 120 entries.

Experiment 1 | Experiment 2 | Experiment 3
10             2              5
5              3              2
8              1              5
1              1              6
.              .              .
.              .              .

What I'm trying to figure out is how to determine if the data I have is normally distributed and if my sample mean is normally distributed. The first thing that came to find is to figure out the frequency of the number of phone calls and throw them in a histogram, but the histogram is all over the place and has no normal distribution feel to it and cannot conclude the data is not normally distributed.

Experiment         1        2       3
Mean           5.150    4.567   5.092 = 4.936 E(X-bar)
STD            2.975    2.734   2.959
VAR            8.851    7.475   8.756

Which statistical software you use? You can import your data in R for example and use shapiro test or q-q plot to figure out your data is normally distributed — YouTah, Aug 04 '15 at 16:57
I'm not using any software for this; I only have Excel on hand. — Dimitri, Aug 04 '15 at 16:58
plot the data, use histogram, frequency table, see if its normally distributed. — forecaster, Aug 04 '15 at 17:00
http://stats.stackexchange.com/questions/72418/how-to-check-for-normal-distribution-using-excel-for-performing-a-t-test — YouTah, Aug 04 '15 at 17:04
You an do this by histogram but shapiro test or q-q plot are better tools — YouTah, Aug 04 '15 at 17:05
You might be able to approximate the data with a normal distribution (plot them and see), but you must realise that the data are discrete, can't be less than zero, and likely have larger variance as counts get larger. — Gavin Simpson, Aug 04 '15 at 17:09
@GavinSimpson out of curiosity, if we're dealing with this data, does this mean that we're dealing with a population or just simply a sample in this case? I mean, we have three samples consisting of 120 entries, making a total of 360. But a sample means taking a sample from the entire population... — Dimitri, Aug 05 '15 at 02:55
I would say you have a single sample from the population of all phone calls. You looked at 3 different hours, so you might want to control for different mean call rates in the three hours, but this is still a sample from the population. You could work on a single hour as the sample but then you'd have no means of testing whether different hours had different call rates. — Gavin Simpson, Aug 05 '15 at 03:20

Determining if data is normally distributed

0 Answers0