2

My data set only consists of the values -1, 0, and 1. Suppose my data set looks something like this: {0, 0, 0, 1, 0, 1, 0, -1, -1, 1}. So sample size n = 10, mean = 1/10, and sd = 0.738. Now if I wanted to calculate 95% CI from the normal distribution I would've calculated (in R)

> error <- qnorm(0.975,df=n-1)*s/sqrt(n)
> left <- mean-error
> right <- mean+error

Where left and right are the lower and upper bounds, respectively. However, since my data is not normally distributed...how do I go about calculating a 95% CI?

Adrian
  • 1,665
  • 3
  • 22
  • 42
  • 2
    Do you want the confidence interval for the mean? Or some other parameter? –  Apr 05 '16 at 06:35
  • 2
    1. Are you sure this is nominal (rather than say ordinal)? 2. What population quantity are you finding a CI for with your nominal sample? Note that if it's nominal there isn't a mean. – Glen_b Apr 05 '16 at 06:35
  • 1
    Using the normal distribution seems to be not appropriate for your data since it is nominal (ordinal?) only. You might want to look at the median or mode. Please note that calculating an arithmetic mean makes no sense for qualitative variables (nominal or ordinal like e.g. school grades). – Dr_Be Apr 05 '16 at 06:56

1 Answers1

2

The sample size is so small that creating a 95% (or 99%, for what matters) confidence interval is practically almost irrelevant, so you could easily disregard what follows, if you want really to inform people (who would apply your findings if stemming only from 10 cases?).

However, the simplest and possibly most robust approach I would recommend would be to use percentile bootstrap, maybe with 10,000 bootstrap samples, using for inference the median, 2.5th percentile, and 97.5th percentile.

In my experience and in keeping with established sources, bootstrap is almost always the best choice when simple and reliable parametric approaches are lacking. I really recommend for instance the seminal book by Efron and Tibshirani, despite being somewhat old.

A possible way in R to get inferential estimates for both mean and median could be the following:

data <- c(0, 0, 0, 1, 0, 1, 0, -1, -1, 1)
resamples <- lapply(1:10000, function(i)
sample(data, replace = T))

r.mean <- sapply(resamples, mean)
head(r.mean)
quantile(r.mean, c(.005, .025, .5, .975, .995)) # results for the mean

r.median <- sapply(resamples, median)
head(r.median)
quantile(r.median, c(.005, .025, .5, .975, .995)) # results for the median
Giuseppe Biondi-Zoccai
  • 2,244
  • 3
  • 19
  • 48
  • 1
    Efron and Tibshirani may be old but still I consider it one of two greatest books on this topic available. Second one is by Davison and Hinkley: http://stats.stackexchange.com/questions/128839/best-suggested-textbooks-on-bootstrap-resampling/128841#128841 – Tim Apr 05 '16 at 07:54
  • 1
    (+1) E. and T. are really fantastic... and I find that old books are often written in better prose as well! – Giuseppe Biondi-Zoccai Apr 05 '16 at 08:16
  • 2
    The word "clinically" here reflects an unconscious assumption that the OP is in your field. We can all translate to "practically". – Nick Cox Apr 05 '16 at 08:18
  • 1
    Sorry... being a clinician I always think in medical terms. I have edited the entry. – Giuseppe Biondi-Zoccai Apr 05 '16 at 11:02
  • 3
    Small comment: While the bootstrap relies on less assumptions than parametric procedures, the justification of the bootstrap are asymptotic. There is no theoretical justification to support bootstrapping in small samples. See [here](http://stats.stackexchange.com/a/59840/21054) for example. This does not mean that this is bad advice, it is simply very difficult to learn much from such small samples. – COOLSerdash Apr 05 '16 at 12:48
  • (+1) Agreed. Being a physician and a would-be statistician, I always struggle with myself to avoid overinterpreting what is simply spurious precision. – Giuseppe Biondi-Zoccai Apr 05 '16 at 14:05