Choose probability distribution to maximize evaluation function (for CDC flu forecasting contest)

Question

Suppose you have a discrete random variable $X$ with probability mass function $p(x) = P(X=x)$ on the support $0,\ldots,n$. What function $q(x)\ge 0$ such that $\sum_{x=0}^n q(x) = 1$ maximizes $$ E(\log[q(X-1)+q(X)+q(X+1)])? $$ To avoid dealing with edge cases, assume $P(X=0)=P(X=n)=0$.

Related questions:

I believe the $q(x)$ that maximizes the above expectation also maximizes $E[q(X-1)+q(X)+q(X+1)]$ since $\log$ is monotonic. Is that correct?
Can anything beat $p(x)=q(x)$?

For those who are interested, this question arises from the CDC Flu Forecasting competition where they use the log of the sum of the probabilities for the target value and neighboring values as the utility function to evaluate forecasts.

Could you provide a link? For probably very obvious reasons, I'm particularly interested... — Cliff AB, Sep 16 '16 at 04:27
I do not see why the solution of $\max_q \mathbb{E}[q(X-1)+q(X)+q(X+1)]$ should be the same as the solution of $\max_q \mathbb{E}[\log\{q(X-1)+q(X)+q(X+1)\}]$ — Xi'an, Sep 16 '16 at 07:04
I've added a link to a news release. Unfortunately the link within the article, which is to the actual competition site, is currently down. Hopefully it will be back up soon. — jaradniemi, Sep 16 '16 at 12:35
The idea is that we want to evaluate your pmf for a target, e.g. peak week, but since the data themselves are noisy, the target is uncertain. — jaradniemi, Sep 16 '16 at 12:56
The idea is that we want to evaluate your pmf (or marginal likelihood) for a target, e.g. peak week. Since the data themselves are noisy, the target is uncertain and therefore the criterion allows you to be off by one unit, e.g. week, in either direction. The criterion calculates the log of the sum of the three pmf values. But then, it seems to me, you are better off not reporting your true pmf $p$ but instead reporting $q$. So yes, I believe the target is to maximize the expectation in the question. — jaradniemi, Sep 16 '16 at 13:03
@jaradniemi: sorry for the confusion, now I am on a true computer and not on my phone, I see it is indeed a maximisation problem! If the system of equations $q_{i-1}+q_i+q_{i+1}=p_i$ allows for a solution within the simplex, it cannot get better! Else, some of the q_i's are necessarily zero but I do not think there is a close form solution. — Xi'an, Sep 16 '16 at 16:20
@jaradniemi: ah, then it is *exactly* the case that the problem of interval censored data, and the solution to your problem is the interval censored NPMLE. — Cliff AB, Sep 16 '16 at 17:20

Matthew Gunn · Accepted Answer · 2016-09-18T16:47:38.147

Cool problem! As Xi'an's derivation shows, it is related to minimizing the KL-divergence from Q to P. Cliff provides some important context as well.

The problem can be solved trivially using optimization software, but I don't see a way to write a closed-form formula for the general solution. If $q_i \geq 0 $ never binds, then there is an intuitive formula.

Almost certainly optimal $\mathbf{q} \neq \mathbf{p}$ (though see my example graphs at the end, it might be close). And $\max \mathrm{E}[x]$ is not the same problem as $\max \mathrm{E}[\log(x)]$. Observe $x + y$ is not an equivalent objective as $\log(x) + \log(y)$. It's not a monotonic transformation. Expectation is a sum and the log goes inside the sum, so it's not a monotonic transformation of the objective function.

KKT conditions (i.e. necessary and sufficient conditions) for a solution:

Define $q_0 = 0$ and $q_{n+1} = 0$. The problem is: \begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{maximize (over $q_i$)} & \sum_{i=1}^n p_i \log \left( q_{i-1} + q_i + q_{i+1} \right) \\ \mbox{subject to} & q_i \geq 0 \\ & \sum_{i=1}^n q_i = 1 \end{array} \end{equation}

Lagrangian: $$ \mathcal{L} = \sum_i p_i \log \left( q_{i-1} + q_i + q_{i+1} \right) + \sum_i \mu_i q_i -\lambda \left( \sum_i q_i - 1\right) $$ This is a convex optimization problem where Slater's condition holds therefore the KKT conditions are necessary and sufficient conditions for an optimum. First order condition: $$ \frac{p_{i-1}}{q_{i-2} + q_{i-1} + q_{i}} + \frac{p_i}{q_{i-1} + q_i + q_{i+1}} + \frac{p_{i+1}}{q_{i} + q_{i+1} + q_{i+2}} = \lambda - \mu_i $$

Complementary slackness: $$\mu_i q_i = 0 $$ And of course $\mu_i \geq 0$. (It appears from my testing that $\lambda = 1$ but I don't immediately see why.) $\mu_i$ and $\lambda$ are Lagrange multipliers.

Solution if $q_i \geq 0$ never binds.

Then consider solution

$$ p_i = \frac{q_{i-1} + q_i + q_{i+1}}{3} \quad \quad \mu_i = 0 \quad \quad \lambda = 1$$ Plugging into the first order condition, we get $\frac{1}{3} + \frac{1}{3} + \frac{1}{3} = 1$. So it works (as long as $\sum_i q_i = 1$ and $q_i \geq 0$ are also satisfied).

How to write the problem with matrices:

Let $\mathbf{p}$ and $\mathbf{q}$ be vectors. Let $A$ be a tri-band diagonal matrix of ones. Eg. for $n = 5$

$$A = \left[\begin{array}{ccccc} 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0& 0 \\ 0 & 1 & 1 & 1 & 0 \\0 &0 & 1 & 1&1\\ 0 &0 &0 & 1 & 1 \end{array} \right] $$

Problem can be written with more matrix notation:

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{maximize (over $\mathbf{q}$)} & \mathbf{p}'\log\left(A \mathbf{q} \right) \\ \mbox{subject to} & q_i \geq 0 \\ & \sum_i q_i = 1 \end{array} \end{equation}

This can be solved fast numerically but I don't see a way to a clean closed form solution?

Solution is characterized by: $$A\mathbf{y} = \lambda - \mathbf{u} \quad \quad \mathbf{x} = A \mathbf{q} \quad \quad y_i = \frac{p_i}{x_i} $$ but I don't see how that's terribly helpful beyond checking your optimization software.

Code to solve it using CVX and MATLAB

A = eye(n) + diag(ones(n-1,1),1) + diag(ones(n-1,1),-1);

cvx_begin
 variable q(n)
 dual variable u;
 dual variable l;
 maximize(p'*log(A*q))

 subject to:
  u: q >= 0;
  l: sum(q) <= 1;
cvx_end

Eg. inputs:

p = 0.0724    0.0383    0.0968    0.1040    0.1384    0.1657    0.0279    0.0856    0.2614    0.0095

has solution:

q = 0.0000    0.1929    0.0000    0.0341    0.3886    0.0000    0.0000    0.2865    0.0979    0.0000

Solution I get (blue) when I have a ton of bins basically following normal pdf (red): Another more arbitrary problem:

Very loosely, for $p_{i-1} \approx p_i \approx p_{i+1}$ you get $q_i \approx p_i$, but if $p_i$ moves around a ton, you get some tricky stuff going on where the optimization tries to put the mass on $q_i$'s in the neighborhood of $p_i$ mass, strategically placing it between $p_i$'s with mass.

Another conceptual point is that uncertainty in your forecast will effectively smooth your estimate of $p$, and a smoother $p$ will have a solution $q$ that's closer to $p$. (I think that's right.)

I do not understand the $\mu_i=0$ condition and I would have omitted the inclusion of the $q_i\ge 0$ constraint in the Lagrangian. — Xi'an, Sep 16 '16 at 06:54
@Xi'an When I solved this problem numerically using CVX, the $q_i \geq 0 $ constraint binds in certain cases, hence the multiplier $\mu_i$ is positive for some $i$. The $\mu_iq_i =0$ is just a dumb way to say that if $\mu_i > 0$ then $q_i = 0$ and vice versa. — Matthew Gunn, Sep 16 '16 at 07:00
Thanks for the answer. I was hoping to replicate your results but using R, but it appears that is not so straightforward. — jaradniemi, Sep 19 '16 at 19:40
@jaradniemi My R is not very good, but you could probably get simple code from someone who has done some optimization in R before. With matrix $A$ defined as I did, you want to solve the convex minimization problem $\mathrm{minimize} -\mathbf{p}' \log\left(A\mathbf{q} \right)$ subject to $\mathbf{q} \geq \mathbf{0}$ and $\sum_i q_i = 1$. That said, from my fooling around on this problem, it seems that choosing $\mathbf{q} = \mathbf{p}$ is pretty close for $\mathbf{p}$ that is fairly smooth (eg. see first fig) so you may not gain much. — Matthew Gunn, Sep 19 '16 at 19:54

Xi'an · Answer 2 · 2016-09-16T16:24:41.157

3

Since $\mathbf{q}=\mathbf{p}$ solves$$\arg\min_\mathbf{q} \sum p_i\log\{ p_i\big/q_i\}$$ what about just solving $$q_{i-1}+q_i+q_{i+1}=3p_i\qquad i=1,\ldots,n-1$$ to find the solution to $$\arg\max_\mathbf{q} \sum p_i\log\{ p_i\big/(q_{i-1}+q_i+q_{i+1})\}$$ If the solution to this system of equations does not belong to the $\mathbb{R}^{n+1}$ simplex then the argument will be found on a face of the simplex.

edited Sep 16 '16 at 16:24

answered Sep 16 '16 at 08:02

Xi'an

90,397
9
157
575

1

Typo, it should be arg min. $\min_q \sum_i p_i \left(\log p_i - \log q_i \right)$ is equivalent problem to $\max_q \sum_i p_i \log q_i$ – Matthew Gunn Sep 16 '16 at 09:00
Thank you, Matthew, I eventually found the time to read my entry properly! – Xi'an Sep 16 '16 at 16:17

score 2 · Answer 3 · edited Apr 13 '17 at 12:44

If I understand this correctly, I do not think this will have a closed form solution. Or moreover, it's at least a specialization of a problem that is not in closed form.

The reason I say this is that it is exactly the likelihood that appears in the NPMLE for interval censored data, the specialization being that all the intervals are of the form $[X-1, X+1]$. In general, the NPMLE is the maximizer of the function

$ \sum_{i = 1}^n \log(P(t_i \in [L_i, R_i]) )$

where $t_i$ is the event time for subject $i$, where all that is known is that the event occurred within the interval $[L_i, R_i]$. This equates to exactly your problem, with $L_i = X_i-1$ and $R_i = X_i + 1$.

In general, this is not in closed form. However, at least one special case is; that of current status data, or when all the intervals are of the form $[0, c_i]$ or $[c_i, \infty)$.

That being said, there are plenty of algorithms for solving the NPMLE! You can fit that using R's icenReg package with the ic_np function (note: I'm the author). Make sure set the option B = c(1,1), declaring that the intervals are closed.

Note that it is not the case that the function $q$ that maximizes $E[q(X-1)+ ...]$ also maximizes $E[\log(q(X-1) + ...]$. As a trivial example, suppose $X_1 = 1, X_2 = 1, X_3 = 10$. Then $q(1) = 1$ and 0 otherwise maximizes $E[q(X-1)+ ...]$ but is undefined for $E[\log(q(X-1) + ...]$.

Choose probability distribution to maximize evaluation function (for CDC flu forecasting contest)

3 Answers3

KKT conditions (i.e. necessary and sufficient conditions) for a solution:

Solution if $q_i \geq 0$ never binds.

How to write the problem with matrices:

Code to solve it using CVX and MATLAB