7

I am reading Wasserman's book "All of Statistics" in which he defines a statistical functional as any function $T(F)$ of the cumulative distribution function $F(x)$ that outputs a real number. Then he goes on to define a 'linear statistical functional' as a functional $T$ for which the following condition holds:

$$T(aF+bG) = aT(F) + bT(G)$$

where $F$ and $G$ are CDFs, and $a, b$ are constants. Obviously the functionals like the mean and variance are linear. What are some examples of "nonlinear statistical functionals"?

seanv507
  • 4,305
  • 16
  • 25
Peaceful
  • 603
  • 3
  • 19

1 Answers1

9
  • Variances. Per the Wikipedia page on mixture distributions, expectations are linear, but variances are not. (When you think about it, this is kind of obvious, because expectations involve integrating over a function, which is linear, but variances involve integrating over the square of a function, which isn't.)

    Specifically, for $n$ mixture components with means $\mu_i$, variances $\sigma_i^2$ and mixture weights $w_i$ summing to one, we have a mixture mean and variance of $$ \begin{align*} \mu =& \sum_{i=1}^n w_i\mu_i \\ \sigma^2 =& \sum_{i=1}^n w_i(\sigma_i^2+\mu_i^2-\mu^2). \end{align*}$$ The expression for $\sigma^2$ is quite different from $\sum w_i\sigma_i^2$ because of the squared means. For instance, consider two normals $N(0,1)$ and $N(1,2)$ with weights $(0.3,0.7)$, then $$ \begin{align*} \mu =& w_1\mu_1+w_2\mu_2 = 0.7 \\ \sigma^2 =& w_1(\sigma_1^2+\mu_1^2-\mu^2)+w_2(\sigma_2^2+\mu_2^2-\mu^2) = 1.91 \neq 1.7 = w_1\sigma_1^2+w_2\sigma_2^2. \end{align*}$$

    Here is a quick R simulation for people who (like me) don't trust my math-fu:

    weights <- c(0.3,0.7)
    means <- c(0,1)
    vars <- c(1,2)
    
    index <- 2-(runif(1e7)<weights[1])
    sims <- rnorm(length(which_one),mean=means[index],sd=sqrt(vars[index]))
    
    mean(sims)
    sum(weights*means)
    
    var(sims)
    sum(weights*vars)
    
  • Quantiles. For instance, your CDFs could be normal distributions with different means and variances, so $aF+bG$ would be a Gaussian mixture, and $T$ could extract any quantile. Quantiles of mixtures are not simply the weighted averages of the quantiles of the components. (See here for an argument why the median is not a linear functional for mixtures of normal distributions.)

  • The maximum or minimum of distributions with bounded support. If your two CDFs are for a $U[0,1]$ and a $U[0,2]$ distribution and $a,b>0$, then the mixture will have minimum $0$ and maximum $2$, regardless of the specific values of $a$ and $b$. Yes, this is not all that different from quantiles.

  • The (-1)-median, which is the functional that minimizes the expected mean absolute percentage error, and is not very well known.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks! Are there other similarly obvious ones? – Peaceful May 01 '21 at 15:23
  • Well, the min and the max of bounded distributions are really special cases of quantiles. I can also offer the (-1)-median, which is rather unknown. – Stephan Kolassa May 01 '21 at 15:59
  • Why do you bring up the (-1)-median when you’re so down on MAPE? (I follow your arguments for why MAPE is flawed, but I wonder why (-1)-median is interesting despite the flaws of MAPE.) – Dave May 01 '21 at 17:06
  • @Dave: I wouldn't say I'm *down* on the MAPE. It's used a lot, even if I personally am not very fond of it. But simply because it's widely used, I like to know that there is a particular functional that optimizes it, even if the definition of the (-1)-median is really nothing more than "that thing that optimizes the MAPE". Besides, dropping that name is sometimes a good conversation opener. (Among the right kind of people.) – Stephan Kolassa May 02 '21 at 05:44
  • @StephanKolassa : What about the simplest of the quantiles, the median? Is it a nonlinear statistical functional? – Peaceful May 09 '21 at 09:52
  • 1
    @Peaceful: yes, certainly. That's covered in my first bullet point. – Stephan Kolassa May 09 '21 at 10:13
  • @StephanKolassa : Do you know an explicit way to prove that linearity is impossible in general? An example perhaps? Consider a valid CDF $H(x)=aF(x)+(1-a)G(x)$ with $a\in[0,1]$. Now if $T(H)=x_M$, the median, then how would we prove that $x_M = a\mu_1+(1-a)\mu_2$ is impossible? Here $F$ and $G$ are Gaussian CDFS with means $\mu_1$ and $\mu_2$ – Peaceful May 09 '21 at 10:33