2

I'm reading an article on the use of influence curves in robust estimation (Hampel, 1974) which includes the following definition of an influence curve for an estimator $T$:

Let $R$ be the real line, let $T$ be a real-valued functional defined on some subset of the set of all probability measures on $R$, and let $F$ denote a probability measure on $R$ for which $T$ is defined. Denote by $\delta_x$ the probability measure determined by the point mass $1$ in any given point $x \in R$. Mixtures of $F$ and some $\delta_x$ are written as $(1 - \epsilon)F + \epsilon \delta_x$, for $0 < \epsilon < 1$. Then the influence curve $IC_{T,F} (.)$ of (the "estimator") $T$ at (the "underlying probability distribution") $F$ is defined pointwise by $IC_{T,F}(x) = \lim_{\epsilon \to 0} \{ T[(1 - \epsilon)F + \epsilon \delta_x] -T(F) \}/\epsilon$ if this limit is defined for every point $x \in R$.

What is the quantity $\delta_x$ measuring?

Is $\delta_x$ the same as the infinitesimal probability $p_X(x)d x$ for a density $p_X(x)$ (say from cumulative distribution $P$) over the interval $[x,x+dx]$? $\delta_x$ is also called an "atomic probability measure" later in the article.

If so, then $IC_{T,F}(x)$ measures the "rate of change" in a function $T(F)$ as you mix in a little bit ($\epsilon$) of an alternate distribution $P$, is that correct?

I'm trying to wrap my mind around how one might have a weighted mixture of two probability distributions. It's an important concept to understand for new causal inference techniques such as Targeted Maximum Likelihood Estimation.

RobertF
  • 4,380
  • 6
  • 29
  • 46

1 Answers1

1

$\delta_x$ is the probability measure defined by $$ \delta_x(A) = \begin{cases} 1 & x \in A \\ 0 & \text{o.w.}\end{cases} $$ so it is just a point mass with all of the probability on a single value. If we integrate some function with respect to it we get $$ \int_{\mathbb R} f \,\text d\delta_x = \int_{\mathbb R\backslash\{x\}} f\,\text d\delta_x + \int_{\{x\}}f\,\text d\delta_x = 0 + f(x) $$ so it effectively evaluates $f$ at $x$ (and the other properties of a measure can be verified). So you can also think of $\delta_x$ as an "evaluation functional" that does the mapping $f\mapsto f(x)$. You can get more on this and other uses in the wikipedia article on the Dirac delta.

Given some other probability measure $\nu$ on $(\mathbb R,\mathbb B)$ it's totally fine to consider a new measure given by a convex combination like $$ P := \alpha \nu + (1-\alpha)\delta_x $$ for $0 \leq \alpha \leq 1$. For some Borel $A$ this is $$ P(A) = \alpha \nu(A) + (1-\alpha)\delta_x(A) = \begin{cases} \alpha \nu(A) + 1-\alpha & x \in A \\ \alpha \nu(A) & \text{o.w.}\end{cases}. $$ Note $P(\mathbb R) = 1$ so this still is a probability measure.

As a side comment, any discrete distribution can be viewed as a convex combination of $\delta_x$ for various $x$. E.g. the Poisson distribution can be written as $$ P(A) = \sum_{n\in\mathbb N} \frac{\lambda^ne^{-\lambda}}{n!}\delta_{n}(A) $$ so we have a countable infinity of weights and the weight for $\delta_n$ is $\frac{\lambda^ne^{-\lambda}}{n!}$.

And it turns out there's nothing wrong with doing these combinations between discrete and continuous measures. For example, suppose $X\sim\mathcal N(0,1)$ and define $Y = \max\{0,X\}$. $Y$ is continuous on $(0,\infty)$ but has a positive probability of being exactly $0$, so it is neither discrete nor continuous. The correct dominating measure here is $$ \frac 12 \delta_0 + \frac 12 \lambda $$ where $\lambda$ is the Lebesgue measure.


Regarding $IC$, we have$\newcommand{\e}{\varepsilon}$ $$ \lim_{\e\to 0} \frac{T[(1-\e)F + \e\delta_x] - T[F]}{\e} $$ so I think we can interpret this like a directional derivative where we have our probability measure $F$ and we take a "step" by shifting some mass onto just $x$.

jld
  • 18,405
  • 2
  • 52
  • 65
  • Thanks this is helpful, reading through now. So $\delta_x$ is just a probability mass function that has only one value, correct? And if we want to substitute a continuous probability distribution instead of a point mass $\delta_x$, we then have a Gateaux derivative. – RobertF Jun 11 '20 at 20:45
  • @RobertF yeah that sounds correct, and I guess a [functional derivative](https://en.wikipedia.org/wiki/Functional_derivative#Functional_derivative) would probably be the simplest way to talk about it (without needing the abstractness of Gateaux derivatives) – jld Jun 11 '20 at 20:50
  • Yes, I've been researching functionals as well, which have a somewhat nebulous definition & seems to be equivalent to integration as far as I can tell. Perhaps a good question for the Mathematics forum. – RobertF Jun 11 '20 at 20:54
  • Ok, so basically the $IC$ is measuring the "influence" of throwing an additional observation $x$ into our sample estimate (e.g., the mean). – RobertF Jun 11 '20 at 20:55
  • 1
    @RobertF we've got a probability measure $F$ and we're looking at this directional derivative where the directions are $\delta_x$ which represent all of the mass being concentrated on a single point. For every $F$ in our space we get a function $IC_{T,F} :\mathbb R\to\mathbb R$ giving all of these sensitivities to $F$ moving towards being concentrated on that point. So rather than getting an extra observation, I think it's about looking at what happens as we push $F$ to have $x$ be more likely for each $x\in\mathbb R$ – jld Jun 11 '20 at 21:14
  • @RobertF this might be helpful too https://en.wikipedia.org/wiki/Robust_statistics#Influence_function_and_sensitivity_curve – jld Jun 11 '20 at 21:16
  • 1
    BTW had an aha! moment after reading the Hampel paper and a set of helpful class notes on functionals I found here: http://julian.tau.ac.il/~bqs/functionals/functionals.html. Just as a first order Taylor series approximation for a function can be written as: $f(x)\approx f(\theta)+f'(\theta)(x-\theta)$, a Taylor series approximation of a functional $T(F)$ can be written as follows: $T(G)\approx T(F) + \int IC_{T,F}(x)d(G(x)-F(x))$ where $\int IC_{T,F}(x)$ is essentially the sum of partial derivatives (= IC) over an infinite dimensional space of probability distributions at $T(F)$. – RobertF Jun 12 '20 at 20:49
  • @RobertF interesting, thanks for sharing! – jld Jun 13 '20 at 23:21