3

Background:

The weak law of large numbers states that for a sequence $X_1,X_2,\ldots,X_n$ of iid RVs, with expectation $\mu$ and variance $\sigma^2$, the sample mean converges to $\mu$:

$$\hat{X}=\frac{\sum_{i=1}^nX_i}{n}\stackrel{p}{\rightarrow}\mu$$

That is the sample mean converges in probability to the population mean as the number of RVs approaches $\infty$.

Question:

How can you obtain infinitely many iid random variables from a finite population? How do you in practise check that your random variables are independent?

Edit:

By finite population I mean that you consider a population of individuals. This population is finite. You consider a characteristic in the population. You model the characteristic with a random variable. I do not mean that the range of the random variable is finite.

Edit 2

We know that $\mu$ is a population characteristic. Let us assume the population is of size $n$. Denote by $Y$ the random variable that describes the population characteristic. Then $\mu=\text{E}(Y)$. Let $Y_1,\ldots,Y_n$ denote the random variables of respectively individual 1 to $n$. By definition $\mu=\frac{\sum_{i=1}^nY_i}{n}$. We then make a sample from the population. How can we obtain a sample of size $n+1$ or $n\rightarrow\infty$ that is iid from a population that is finite? Here some say that we can sample from $Y_1,\ldots,Y_n$ WITH replacement.

Edit 3

If we consider a sample of size $N$ with $N\leq n$, where the sampling is done without replacement, can we can obtain a iid sample if the original RVs are independent? If $Y_1,\ldots\ Y_n$ are iid and we let $X_1=Y_1,\ldots, X_n=Y_n$ and consider any subset of $X_1,\ldots, X_n$, then this subset will consist of iid RVs, right? Am I missing some point here?

FredrikAa
  • 319
  • 1
  • 8
  • 3
    So you don't ever have infinitely many. It's that after you have a large finite number, you get `close' to the limit value of $\mu$. – VCG Aug 25 '16 at 12:39
  • So, if the random variable's range is not finite, it clearly models some other variability in addition to that of random sampling the individuals from the finite set of individuals. But in this case you will not find out $E[X_i]$ exactly even by sampling all individuals in the finite set. Can you clarify whether you are looking for estimating $E[X_i]$ or $(X_1+\ldots+X_N)/N$? In the first case, no, you cannot physically obtain more samples than there are individuals; in the second case I don't see how LLN is supposed to be related. – Juho Kokkala Aug 26 '16 at 09:33
  • "Finite population" is a confusing term -- I initially interpreted "finite population" in the sense discussed in this answer http://stats.stackexchange.com/a/99166/24669 (linked in @VCG's answer here), but then the random variable's range would be finite. Instead the "finiteness" here may refer to there only being a finite number of samples from the "data-generating distribution" since no more individuals exist. – Juho Kokkala Aug 26 '16 at 09:39
  • @gung The second answer begins "So if your question is", so I don't think it is good evidence for clarity of the question. (However, no need to argue, if my close vote remains the only one, I suppose it's good evidence that the problem is with me and not with the question) – Juho Kokkala Aug 26 '16 at 13:44
  • @gung The OP correctly states the LLN as applying to sequences of **iid** random variables, but the accepted answer writes about **sampling without replacement** from a finite population which necessarily leads to **dependent** random variables (though they _are_ **identically distributed.** ) As such, neither the OP, nor (judging by the comment thread) the writer of the accepted answer seem to be appreciating the difference between iid and did. I too am adding my vote to close as unclear what is being asked. – Dilip Sarwate Aug 26 '16 at 14:47
  • @Juho Kokkala, assume you have a population of $n$ individuals. We consider a population characteristic with distribution $Y$. Let $Y_1,\ldots ,Y_n$ denote the random variables for the characteristics of individual $1$, $2$ etc. The population mean is then $\mu=\frac{\sum_{i=1}^n Y_i}{n}$. By definition $\mu=\text{E}(Y_1)$. We then consider a sample from the population. Let us say we consider a sample of size $n$. Let $X_1,\ldots,X_n$ denote the sample. If $X_1=Y_1,\ldots,X_n=Y_n$, and $Y_1,\ldots,Y_n$ are independent, then $X_1,\ldots,X_n$ are iid EVEN IF the sampling – FredrikAa Aug 26 '16 at 18:25
  • is done without replacement. You cannot make a sample of size $n+1$ without replacement from the population of size $n$. – FredrikAa Aug 26 '16 at 18:27
  • @JuhoKokkala I try to do my best. Hopefully the edits clarify? – FredrikAa Aug 26 '16 at 19:05
  • @DilipSarwate I try do my best. Hopefully the edits and the comment on VCG's answer clarify? – FredrikAa Aug 26 '16 at 20:18
  • 1
    It is not possible for the $Y_i$s be iid and $E(Y_1)=\mu=\frac{\sum_{i=1}^n Y_i}{n}$ to hold (except technically iid holds if $Y_i$ is a constant rv). If for example $Y_i \sim N(10,5)$, the mean of any finite number of $Y_i$s shall not be exactly $10$ (except with probability $0$). So, $\mu$ cannot be both the mean of the underlying distribution and the mean of the $n$ iid samples. – Juho Kokkala Aug 27 '16 at 15:11
  • @JuhoKokkala, thnx for your ans.. So, what you are saying is that we should really write $$\mu_{\text{pop}}=\frac{\sum_{i=1}^nY_i}{n}$$ to denote the population mean. When we write $Y\sim \text{N}(\mu,\sigma^2)$, $\mu$ is a model parameter. We have that $\mu=\mu_{\text{pop}}$ if and only if the distribution of $Y$ is truly a normal distribution. This cannot occur if the population is finite (i.e we have $Y_1,\ldots,Y_n$, where $n$ is finite), since when we make a histogram of the realisations of $Y_1,\ldots,Y_n$, the histogram cannot be perfectly normal. Is this a correct interp of your ans? – FredrikAa Aug 27 '16 at 18:33
  • 1
    You are still mixing the distribution of the underlying random variable and the distribution observed in your sample(the finite population) -- I don't know what it means to say that 'the distribution of $Y$ is truly a normal distribution'. I'd also avoid the term 'population mean' since it is often used to refer to the model parameter (mean of a hypothetical infinite population) -- if you model the $Y_i$s as iid realizations from some distribution, then in modeling terms your finite population is really a sample even if it happens to contain every entity that exist in the real world. – Juho Kokkala Aug 27 '16 at 18:45
  • 1
    This comment chain is getting all too long, and the issue does not seem to be very related to the law of large numbers (I suspect that if the unclarities related to finite populations, samples, etc. are resolved, it will turn out that the question as formulated is not very relevant). Perhaps you want to post a new question. If you have a real-life application in mind, it might be helpful to describe that, too, as that may make it easier to see behind the formalizations and possible errors in them. – Juho Kokkala Aug 27 '16 at 18:50
  • @JuhoKokkala I see your point. I have been using a book that defined the population mean as the mean of all the realisations in the population, please see [Population, sample and model](http://stats.stackexchange.com/questions/232300/population-sample-and-model) . Then we should be able to get the true value of the population mean, with no error. Therefore I was wondering why the LLW could be of any use in finite populations. – FredrikAa Aug 29 '16 at 17:09
  • You appear to use the work "realisation" in an unusual way, one that likely was not intended by your reference. Your use sounds like you ought to be writing "subject" or "element" (of the sample space). Your "Edit 2" appears to answer your question: when sampling with replacement, infinitely many samples are possible. In "Edit 3," note that when sampling without replacement, the individual results are identically distributed but they cannot possibly be independent. Whence "iid" never holds in that case except for samples of size one (trivially). – whuber Aug 31 '16 at 13:38
  • The previously accepted answer has now been deleted by its author, who clarified that he did not understand the question. Moreover, this comment thread does not seem to be progressing towards greater clarity. As a result, I have now voted to close. (cc @JuhoKokkala) – gung - Reinstate Monica Aug 31 '16 at 17:53

1 Answers1

2

So if your question is "how can we apply the LLN with finite data" then I give the answer I gave in the comments:

"So you don't ever have infinitely many. It's that after you have a large finite number, you get `close' to the limit value of $\mu$"

But, if your question is rather: How do we reconcile the LLN with a finite population, then I lead to to this fantastic post about finite population sampling :Are "random sample" and "iid random variable" synonyms?

The other post about $n\to N$ being equivalent to $n\to\infty$ is problematic as LLN is a result of probability limits and sequences and those necessarily are not finite in index.

So n is only ever able to reach N if we sample without replacement, meaning the samples are no longer iid. If we sample with replacement then we never run out of n.

VCG
  • 683
  • 5
  • 9
  • Thanks for your answer, @VCG, really appreciated. So you have given two answers. In answer (1) you say that you really do not ever have an infinite sequence of i.i.d RVs, but as the number of RV gets big enough, you get close enough to the limit $\mu$? – FredrikAa Aug 26 '16 at 19:22
  • My question was really your second answer, @VCG. I do not understand the accepted answer in your link: are the values you sample not realisations of random variables? There is a lengthy discussion on this. There seems to be a distinction between the distribution of the samples and the distribution of the elements that are sampeled? – FredrikAa Aug 26 '16 at 19:38
  • Hmm..I'm trying to understand your question and your edits. We're getting into heavy terminology territory so I'll think more about it. – VCG Aug 26 '16 at 20:31