3

I would like to estimate the true distribution of the following data set by making use of bootstrap method.

age = (21,81,85,27,39,61,15,20,39,40,87,87,69,59,54,71,66,88,1,2)

The population size is 20.

Can you please help me with both theoretical information and as well as the R programming code.

Andy
  • 18,070
  • 20
  • 77
  • 100
  • You do not need bootstrap for estimating the distribution. The natural thing to do is to estimate it with the empirical distribution: http://en.wikipedia.org/wiki/Empirical_distribution_function – Manuel Dec 23 '14 at 13:22

1 Answers1

5

Bootstrap won't give you the "true" distribution of you variable of interest, but rather an approximation that might be helpful in estimating parameters of the true distribution.

The idea is very simple: you sample with replacement $N$ cases from your dataset of $N$ observations the same way as you sampled your data from the population. In R that would look like this:

age <- c(21,81,85,27,39,61,15,20,39,40,87,87,69,59,54,71,66,88,1,2)
N <- 20

age_boot <- matrix(NA, 100, 20)
for (i in 1:100) {
  age_boot[i, ] <- sample(age, N, replace=TRUE)
}

or simpler but more "hacky" way:

age_boot <- replicate(100, sample(age, N, replace=TRUE))

By using empirical estimates on bootstrap samples you can obtain parameters of the distribution of your variable (e.g. mean, mode, variance).

As about references, check original paper by Efron (1979) and the two books referenced here. You can find further description of bootstrap in this thread: Explaining to laypeople why bootstrapping works

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Honestly I don't think this answers the question yet. How to estimate the distribution function based on bootstrap replications? Would the result have better properties as the ecdf? – Michael M Dec 23 '14 at 10:03
  • As I wrote: bootstrap just gives you samples from *approximation* of distribution of your variable. The original question was not about if it is better then ecdf. – Tim Dec 23 '14 at 10:10
  • 3
    @MichaelMayer: I think that the OP implied a **complete workflow**, including *bootstrapping process* per se, described by Tim, as well as *fitting data to distribution*, as described, for example, in [my recent answer](http://stats.stackexchange.com/a/129734/31372), which is obviously generalizable. So, I believe that this answer is correct and useful (+1), but not comprehensive (complete) in the above-mentioned perspective. – Aleksandr Blekh Dec 23 '14 at 10:28
  • @Michael Mayer, how do you go about estimating the true probability distribution using bootstrap method? –  Jan 12 '15 at 09:15