14
  • What is an Indicator function?

  • What is the intuition behind an Indicator function?

  • Why is the indicator function $I_A$ needed in the following example?

  • Can the following example be rewritten without using the indicator function?

Let $A$ be any event. We can write $\Bbb P(A)$ as an expectation, as follows:

Define the indicator function:

$$ I_A = \begin{cases} 1, & \text{if event $A$ occurs} \\ 0, & \text{otherwise} \end{cases} $$

Then $I_A$ is a random variable, and

$$ \Bbb E(I_A) = \sum_{r=0}^1 r \cdot \Bbb P(I_A = r) \\ = \Bbb P(A). $$

Thus

$$ \Bbb P(A) = \Bbb E(I_A). $$

mdewey
  • 16,541
  • 22
  • 30
  • 57
user366312
  • 1,464
  • 3
  • 14
  • 34
  • 5
    [Here](https://arxiv.org/abs/math/9205211) is a paper by Knuth on the related concept of an Iverson bracket. Simply put: indicator functions are useful "switches", similar to how an `if()` statement is useful in programming. – J. M. is not a statistician Oct 08 '16 at 04:55

3 Answers3

12

I do not think that you can go more intuitive about it then saying once again what it does: it returns $1$ for something that interests you, and $0$ for all the other cases.

So if you want to count blue-eyed people, you can use indicator function that returns ones for each blue-eyed person and zero otherwise, and sum the outcomes of the function.

As about probability defined in terms of expectation and indicator function: if you divide the count (or sum of ones) by total number of cases, you get probability. Peter Whittle in his books Probability and Probability via Expectation writes a lot about defining probability like this and even considers such usage of expected value and indicator function as one of the most basic aspects of probability theory.

As about your question in the comment

isn't the Random Variable there to serve the same purpose? Like $H=1$ and $T=0$?

Well, yes it is! In fact, in statistics we use indicator function to create new random variables, e.g. imagine that you have normally distributed random variable $X$, then you may create new random variable using indicator function, say

$$ I_{2<X<3} = \begin{cases} 1 & \text{if} \quad 2 < X < 3 \\ 0 & \text{otherwise} \end{cases} $$

or you may create new random variable using two Bernoulli distributed random variables $A,B$:

$$ I_{A\ne B} = \begin{cases} 0 & \text{if } & A=B, \\ 1 & \text{if } & A \ne B \end{cases} $$

...of course, you could use as well any other function to create new random variable. Indicator function is helpful if you want to focus on some specific event and signalize when it happens.

For a physical indicator function imagine that you marked one of the walls of six-sided dice using red paint, so you can now count red and non-red outcomes. It is not less random them the dice itself, while it's a new random variable that defines outcomes differently.

You may also be interested in reading about Dirac delta that is used in probability and statistics like a continuous counterpart to indicator function.

See also: Why 0 for failure and 1 for success in a Bernoulli distribution?

Tim
  • 108,699
  • 20
  • 212
  • 390
  • 1
    I think an important note is that (blue + blue + not-blue) / 3 is totally meaningless, but (1 + 1 + 0) / 3 = 2/3. – Cliff AB Oct 07 '16 at 21:57
  • @CliffAB, isn't the Random Variable there to serve the same purpose? Like $H=1$ and $T=0$? – user366312 Oct 07 '16 at 22:52
  • 2
    @anonymous: there's nothing that says a random variable needs to take on a numeric value, or that "success" is equal to 1. The whole point of the indicator function is to formalize this. And imagine if you want to know the probability that the random variable X = 2. Using the notation you've suggested, we would have to say that 2 = 1, which is not good notation. – Cliff AB Oct 07 '16 at 23:17
  • "there's nothing that says a random variable needs to take on a numeric value" -- most definitions of *random variable* I have seen require exactly that. (They use a different term for things that are random but not numeric.) – Glen_b Aug 07 '21 at 07:02
6

Indicator random variables are useful in that they provide a seamless connection between probability and expectation. Consider how easy it is to prove Markov's inequality with the help of indicator random variables: let $X$ be a nonnegative random variable, $\alpha > 0$ and then note the trivial inequality $\alpha I_{\{ X \geq \alpha \}} \leq X$. We can then just take an expectation of both sides and do some algebra to get $P(X \geq \alpha) \leq \text{E}(X) / \alpha$. Other proofs, like that of the inclusion-exclusion formula, also make use of this connection. In fact, the whole theory of conditional probability can be developed from the theory of conditional expectation because of this.

They're also nice in that they're idempotent meaning $I_A^2 = I_A$, and this makes calculating variances easy. Also, products of indicator random variables are themselves indicator random variables whose expectation is the probability of the intersection.

Finally, while not really a probabilistic thing, indicator functions are a nice way of translating Boolean operations into arithmetic ones, which is helpful for general programming purposes. For instance, $I_{A \cup B} = \max(I_A, I_B)$ and $I_{A \cap B} = \min(I_A, I_B)$.

dsaxton
  • 11,397
  • 1
  • 23
  • 45
  • 2
    For programming (or algorithm description at least), there is also the [Iverson bracket](https://en.wikipedia.org/wiki/Iverson_bracket) notation, which can be more flexible. (I think it was promoted by Knuth, and variants seem to be in many modern programming languages, though I am not sure how widespread it is as a "math" notation.) – GeoMatt22 Oct 08 '16 at 03:17
0

I also struggled with the topic and the most intuitive reflection I've found is truncation. Indicator function effectively truncates the density.

enter image description here

An example from (Train, 2009) can illustrate the point (e.g. some $v = 2$): $$ \text{P}(\varepsilon>-v) = \int_{-\infty}^{\infty}1_{\{\varepsilon \:>\: -v\}} \cdot f(\varepsilon)d\varepsilon \\ = \int_{-v}^{\infty} f(\varepsilon)d\varepsilon = F(\varepsilon)|_{-v}^{\infty} = F(\infty) - F(-v) = 1 -F(-v) $$


The last statement in OP is also by no means intuitive. The easiest way to get it, in my opinion, is using LOTUS. $$ \mathbb{E}[\: g(x) \: ] = \int_X g(x)\cdot f(x)dx $$ Indicator function is a function, so we may put it directly instead of $g(x)$. Let us take a simple case, say $A = \{x| x > a\}$.

$$ \mathbb{E}[1_A] = \int_{-\infty}^{\infty}1_{\{x>a\}}\cdot f(x)dx = \int_{a}^{\infty} f(x)dx = P(x>a)= P(A) $$

Hope this may help.

garej
  • 227
  • 4
  • 16