Questions tagged [convergence]

Convergence generally means that a sequence of a certain sample quantity approaches a constant as the sample size tends to infinity. Convergence is also a property of an iterative algorithm to stabilize on some aim value.

Convergence refers to the investigation of the behavior of certain sample quantities when the sample size approaches infinity. Two important types of convergence are convergence in probability and almost sure convergence.

Convergence in probability
A sequence of random variables $X_1,...,X_n$ converges in probability to a random variable $X$ if $$\lim_{x \to \infty}P(|X_n-X|\leq\epsilon)=1 $$ for every $\epsilon > 0$. This means that at the limit as $n$ increases to infinity almost all of the probability mass becomes concentrated around $X$ in a small interval. This type of convergence is used in the weak law of large numbers.

Almost sure convergence
Similar to the previous statement, a squence of random variables $X_1,...,X_n$ converges almost surely to a random variable $X$ if $$P(\lim_{x \to \infty}|X_n-X|< \epsilon)=1$$ for every $\epsilon > 0$. Here, compared to the previous case, the limit is achieved with probability one. Almost sure convergence is used in the strong law of large numbers and it implies convergence in probability (note that convergence in probability does not imply almost sure convergence).

999 questions
50
votes
6 answers

Debunking wrong CLT statement

The central limit theorem (CLT) gives some nice properties about converging to a normal distribution. Prior to studying statistics formally, I was under the extremely wrong impression that the CLT said that data approached normality. I now find…
41
votes
6 answers

Intuitive explanation of convergence in distribution and convergence in probability

What is the intuitive difference between a random variable converging in probability versus a random variable converging in distribution? I've read numerous definitions and mathematical equations, but that does not really help. (Please keep in mind,…
nicefella
  • 1,153
  • 2
  • 13
  • 18
31
votes
2 answers

Why is the Expectation Maximization algorithm guaranteed to converge to a local optimum?

I have read a couple of explanations of EM algorithm (e.g. from Bishop's Pattern Recognition and Machine Learning and from Roger and Gerolami First Course on Machine Learning). The derivation of EM is ok, I understand it. I also understand why the…
michal
  • 1,138
  • 3
  • 11
  • 14
31
votes
9 answers

Expectation of 500 coin flips after 500 realizations

I was hoping someone could provide clarity surrounding the following scenario. You are asked "What is the expected number of observed heads and tails if you flip a fair coin 1000 times". Knowing that coin flips are i.i.d. events, and relying on the…
30
votes
1 answer

When is binomial distribution function above/below its limiting Poisson distribution function?

Let $B(n,p,r)$ denote the binomial distribution function (DF) with parameters $n \in \mathbb N$ and $p \in (0,1)$ evaluated at $r \in \{0,1,\ldots,n\}$: \begin{equation} B(n,p,r) = \sum_{i=0}^r \binom{n}{i} p^i (1-p)^{n-i}, \end{equation} and let…
27
votes
3 answers

Extreme Value Theory - Show: Normal to Gumbel

The Maximum of $X_1,\dots,X_n. \sim$ i.i.d. Standardnormals converges to the Standard Gumbel Distribution according to Extreme Value Theory. How can we show that? We have $$P(\max X_i \leq x) = P(X_1 \leq x, \dots, X_n \leq x) = P(X_1 \leq x)…
emcor
  • 1,143
  • 1
  • 10
  • 19
23
votes
1 answer

Central limit theorem and the law of large numbers

I have a very beginner's question regarding the Central Limit Theorem (CLT): I am aware that the CLT states that a mean of i.i.d. random variables is approximately normal distributed (for $n \to \infty$, where $n$ is the index of the summands) or…
23
votes
6 answers

Intuitive understanding of the difference between consistent and asymptotically unbiased

I am trying to to get an intuitive understanding and feel for the difference and practical difference between the term consistent and asymptotically unbiased. I know their mathematical/statistical definitions, but I'm looking for something…
StatsStudent
  • 10,205
  • 4
  • 37
  • 68
22
votes
2 answers

Why second order SGD convergence methods are unpopular for deep learning?

It seems that, especially for deep learning, there are dominating very simple methods for optimizing SGD convergence like ADAM - nice overview: http://ruder.io/optimizing-gradient-descent/ They trace only single direction - discarding information…
22
votes
6 answers

Does the normal distribution converge to a uniform distribution when the standard deviation grows to infinity?

Does the normal distribution converge to a certain distribution if the standard deviation grows without bounds? it appears to me that the pdf starts looking like a uniform distribution with bounds given by $[-2 \sigma, 2 \sigma]$. Is this true?
21
votes
4 answers

Expectation of a product of $n$ dependent random variables when $n\to\infty$

Let $X_1 \sim U[0,1]$ and $X_i \sim U[X_{i - 1}, 1]$, $i = 2, 3,...$. What is the expectation of $X_1 X_2 \cdots X_n$ as $n \rightarrow \infty$?
20
votes
1 answer

How to show that an estimator is consistent?

Is it enough to show that MSE = 0 as $n\rightarrow\infty$? I also read in my notes something about plim. How do I find plim and use it to show that the estimator is consistent?
user3062
19
votes
2 answers

Does log likelihood in GLM have guaranteed convergence to global maxima?

My questions are: Are generalized linear models (GLMs) guaranteed to converge to a global maximum? If so, why? Furthermore, what constraints are there on the link function to insure convexity? My understanding of GLMs is that they maximize a…
19
votes
6 answers

Why doesn't k-means give the global minimum?

I read that the k-means algorithm only converges to a local minimum and not to a global minimum. Why is this? I can logically think of how initialization could affect the final clustering and there is a possibility of sub-optimum clustering, but I…
19
votes
1 answer

Stan $\hat{R}$ versus Gelman-Rubin $\hat{R}$ definition

I was going through the Stan documentation which can be downloaded from here. I was particularly interested in their implementation of the Gelman-Rubin diagnostic. The original paper Gelman & Rubin (1992) define the the potential scale reduction…
1
2 3
66 67