0

Before I start be easy on me I am not a stats person - I am just on the quest for information :-)

I was tasked to generate many graphs for someone who wanted to see if the data presented with a bell shape curve vs a graph with > 1 peak - this was no biggie but silly me I thought maybe it would be easier to run the data thru some stats package like R that would flag the populations that need to be looked at with a graph. The quest has led me to reading Shapiro-Wilk, Anderson–Darling test etc but various people with more knowledge than I still pointed to the graphing of data as the only way to verify Normal Distribution.

So my question is is there any Statistics method I can run the following data thru to flag the data for further investigation instead of graphs ??
Instead of Counts of Patients by days should I just look at full list of DaysInHospital

Thanks

Data is plotted with a graph per disease with the data plotted
x-axis - Days in Hosiptal
y-axis - Cnt of Patients

DiseasePopulation #1

( DaysinHospital,Cnt of Patients) = {(1,1),(2,1),(3,3),(4,6),(5,2),(6,1),(7,1),(8,1)}

DiseasePopulation #2

( DaysinHospital,Cnt of Patients) = {(1,1),(2,1),(3,1),(4,2),(5,6),(6,3),(7,1),(8,1)}
  • @Roland, I *think* that "DaysinHospital" is the distribution being considered here, and we are being given a tabulated variable rather than raw observations. (It also seems the OP's question is really closer to "unimodal vs multimodal" rather than "Normal vs non-Normal" ...) – Ben Bolker Jan 14 '14 at 16:24
  • 1
    The OP apparently has posted already histogram-style data - i.e. 1 observation of 1 day stays, 1 observation of 2 day stays, 3 observations of 3 day stays... If you just used the raw observation list 'x' with only the DaysinHospital values you should be able to easily apply suitable functions - e.g. 'shapiro.test(x)' from '{stats}' or 'dip(x, ...)' from '{diptest}'. –  Jan 14 '14 at 16:28
  • How about a `qqplot` to compare with a standard normal dist? – Carl Witthoft Jan 14 '14 at 16:31
  • Who is telling you that graphing data is the only way to test for normality? That seems like a very, very subjective method. If you found the Shapiro-Wilk test via google, you could find the `shapiro.test` in R just as easily. –  Jan 14 '14 at 16:37
  • @BenBolker Possibly, maybe. But if that's the case `DaysinHospital` is not a continuous variable. A discrete distribution should be used. – Roland Jan 14 '14 at 16:37
  • To be clear, by graphing I was thinking about only looking at a histogram since the OP only mentioned bell curves –  Jan 14 '14 at 16:38
  • 3
    to repeat: I think 'testing for normality' is a red herring, I suspect the OP really wants to test for multimodality (i.e. @CMichael's suggestion of `diptest::dip`), but "is it normal?" is the vocabulary they're working with – Ben Bolker Jan 14 '14 at 16:51

1 Answers1

2

As far as R mechanics are asked you can do stuff like that (as hinted towards in my comment):

#raw observations of number of days in hospital
data1 = c(1,2,3,3,3,4,4,4,4,4,4,5,5,6,7,8)
data2 = c(1,2,2,3,4,4,5,5,5,5,5,5,6,6,6,7,8)

#histograms for first analysis
hist(data1,nclass=8)
hist(data2,nclass=8)

#q-q Plots for further graphic analysis
qqnorm(data2)
qqnorm(data1)

#Shapiro test from stats package
require(stats)
shapiro.test(data1)
shapiro.test(data2)

#Hartigan’s dip test statistic for unimodality from diptest
require(diptest)
dip(data1)
dip(data2)

For really understanding the results delivered by these tests or how to interpret the graphs you should definitely look for appropriate statistics guidance. For starters, this answer on shapiro.test() would be a good start:

https://stackoverflow.com/a/15427746/3124909

CMichael
  • 136
  • 3