What is the quantity $\delta_x$ at point mass $1$ for any point $x$ in the Influence Function formula?

Question

I'm reading an article on the use of influence curves in robust estimation (Hampel, 1974) which includes the following definition of an influence curve for an estimator $T$:

Let $R$ be the real line, let $T$ be a real-valued functional defined on some subset of the set of all probability measures on $R$, and let $F$ denote a probability measure on $R$ for which $T$ is defined. Denote by $\delta_x$ the probability measure determined by the point mass $1$ in any given point $x \in R$. Mixtures of $F$ and some $\delta_x$ are written as $(1 - \epsilon)F + \epsilon \delta_x$, for $0 < \epsilon < 1$. Then the influence curve $IC_{T,F} (.)$ of (the "estimator") $T$ at (the "underlying probability distribution") $F$ is defined pointwise by $IC_{T,F}(x) = \lim_{\epsilon \to 0} \{ T[(1 - \epsilon)F + \epsilon \delta_x] -T(F) \}/\epsilon$ if this limit is defined for every point $x \in R$.

What is the quantity $\delta_x$ measuring?

Is $\delta_x$ the same as the infinitesimal probability $p_X(x)d x$ for a density $p_X(x)$ (say from cumulative distribution $P$) over the interval $[x,x+dx]$? $\delta_x$ is also called an "atomic probability measure" later in the article.

If so, then $IC_{T,F}(x)$ measures the "rate of change" in a function $T(F)$ as you mix in a little bit ($\epsilon$) of an alternate distribution $P$, is that correct?

I'm trying to wrap my mind around how one might have a weighted mixture of two probability distributions. It's an important concept to understand for new causal inference techniques such as Targeted Maximum Likelihood Estimation.

jld · Accepted Answer · 2020-06-11T20:17:20.457

1

$\delta_x$ is the probability measure defined by $$ \delta_x(A) = \begin{cases} 1 & x \in A \\ 0 & \text{o.w.}\end{cases} $$ so it is just a point mass with all of the probability on a single value. If we integrate some function with respect to it we get $$ \int_{\mathbb R} f \,\text d\delta_x = \int_{\mathbb R\backslash\{x\}} f\,\text d\delta_x + \int_{\{x\}}f\,\text d\delta_x = 0 + f(x) $$ so it effectively evaluates $f$ at $x$ (and the other properties of a measure can be verified). So you can also think of $\delta_x$ as an "evaluation functional" that does the mapping $f\mapsto f(x)$. You can get more on this and other uses in the wikipedia article on the Dirac delta.

Given some other probability measure $\nu$ on $(\mathbb R,\mathbb B)$ it's totally fine to consider a new measure given by a convex combination like $$ P := \alpha \nu + (1-\alpha)\delta_x $$ for $0 \leq \alpha \leq 1$. For some Borel $A$ this is $$ P(A) = \alpha \nu(A) + (1-\alpha)\delta_x(A) = \begin{cases} \alpha \nu(A) + 1-\alpha & x \in A \\ \alpha \nu(A) & \text{o.w.}\end{cases}. $$ Note $P(\mathbb R) = 1$ so this still is a probability measure.

As a side comment, any discrete distribution can be viewed as a convex combination of $\delta_x$ for various $x$. E.g. the Poisson distribution can be written as $$ P(A) = \sum_{n\in\mathbb N} \frac{\lambda^ne^{-\lambda}}{n!}\delta_{n}(A) $$ so we have a countable infinity of weights and the weight for $\delta_n$ is $\frac{\lambda^ne^{-\lambda}}{n!}$.

And it turns out there's nothing wrong with doing these combinations between discrete and continuous measures. For example, suppose $X\sim\mathcal N(0,1)$ and define $Y = \max\{0,X\}$. $Y$ is continuous on $(0,\infty)$ but has a positive probability of being exactly $0$, so it is neither discrete nor continuous. The correct dominating measure here is $$ \frac 12 \delta_0 + \frac 12 \lambda $$ where $\lambda$ is the Lebesgue measure.

Regarding $IC$, we have$\newcommand{\e}{\varepsilon}$ $$ \lim_{\e\to 0} \frac{T[(1-\e)F + \e\delta_x] - T[F]}{\e} $$ so I think we can interpret this like a directional derivative where we have our probability measure $F$ and we take a "step" by shifting some mass onto just $x$.

edited Jun 11 '20 at 20:17

answered Jun 11 '20 at 20:07

jld

18,405
2
52
65

Thanks this is helpful, reading through now. So $\delta_x$ is just a probability mass function that has only one value, correct? And if we want to substitute a continuous probability distribution instead of a point mass $\delta_x$, we then have a Gateaux derivative. – RobertF Jun 11 '20 at 20:45
@RobertF yeah that sounds correct, and I guess a [functional derivative](https://en.wikipedia.org/wiki/Functional_derivative#Functional_derivative) would probably be the simplest way to talk about it (without needing the abstractness of Gateaux derivatives) – jld Jun 11 '20 at 20:50
Yes, I've been researching functionals as well, which have a somewhat nebulous definition & seems to be equivalent to integration as far as I can tell. Perhaps a good question for the Mathematics forum. – RobertF Jun 11 '20 at 20:54
Ok, so basically the $IC$ is measuring the "influence" of throwing an additional observation $x$ into our sample estimate (e.g., the mean). – RobertF Jun 11 '20 at 20:55
1

@RobertF we've got a probability measure $F$ and we're looking at this directional derivative where the directions are $\delta_x$ which represent all of the mass being concentrated on a single point. For every $F$ in our space we get a function $IC_{T,F} :\mathbb R\to\mathbb R$ giving all of these sensitivities to $F$ moving towards being concentrated on that point. So rather than getting an extra observation, I think it's about looking at what happens as we push $F$ to have $x$ be more likely for each $x\in\mathbb R$ – jld Jun 11 '20 at 21:14
@RobertF this might be helpful too https://en.wikipedia.org/wiki/Robust_statistics#Influence_function_and_sensitivity_curve – jld Jun 11 '20 at 21:16
1

BTW had an aha! moment after reading the Hampel paper and a set of helpful class notes on functionals I found here: http://julian.tau.ac.il/~bqs/functionals/functionals.html. Just as a first order Taylor series approximation for a function can be written as: $f(x)\approx f(\theta)+f'(\theta)(x-\theta)$, a Taylor series approximation of a functional $T(F)$ can be written as follows: $T(G)\approx T(F) + \int IC_{T,F}(x)d(G(x)-F(x))$ where $\int IC_{T,F}(x)$ is essentially the sum of partial derivatives (= IC) over an infinite dimensional space of probability distributions at $T(F)$. – RobertF Jun 12 '20 at 20:49
@RobertF interesting, thanks for sharing! – jld Jun 13 '20 at 23:21

What is the quantity $\delta_x$ at point mass $1$ for any point $x$ in the Influence Function formula?

1 Answers1