Dropout in Linear Regression

Question

I've been reading the original paper on dropout, (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) and in the linear regression section, it is stated that:

$\mathbb{E}_{R\sim Bernoulli(p)}\left[\| y\ - (R*X)w\|^2\right]$

reduces to:

$\|y - pXw\|^2 + p(1-p) \|\Gamma w\|^2$

I am having trouble understanding how they arrived at this result. Can anyone help?

I wrote a longform essay on this topic: http://madrury.github.io/jekyll/update/statistics/2017/08/12/noisy-regression.html — Matthew Drury, Mar 06 '19 at 16:36

jld · Accepted Answer · 2019-03-06T18:39:38.187

$\newcommand{E}{\text{E}}$First let $R * X = M$ for convenience. Expanding the loss we have $$ \|y - Mw\|^2 = y^Ty - 2w^TM^Ty + w^TM^TMw. $$ Taking the expectation w.r.t. $R$ we have $$ \E_R\left(\|y - Mw\|^2\right) = y^Ty - 2w^T(\E M)^Ty + w^T\E(M^TM)w. $$ The expected value of a matrix is the matrix of cell-wise expected values, so $$ (\E_R M)_{ij} = \E_R((R * X)_{ij}) = X_{ij}\E_R(R_{ij}) = p X_{ij} $$ so $$ 2w^T(\E M)^Ty = 2pw^TX^Ty. $$ For the last term, $$ (M^TM)_{ij} = \sum_{k=1}^N M_{ki}M_{kj} = \sum_{k=1}^N R_{ki}R_{kj}X_{ki}X_{kj} $$ therefore $$ (\E_R M^TM)_{ij} = \sum_{k=1}^N \E_R(R_{ki}R_{kj})X_{ki}X_{kj}. $$ If $i \neq j$ then they are independent so the off-diagonal elements result in $p^2 (X^TX)_{ij}$. For the diagonal elements we have $$ \sum_{k=1}^N \E_R(R_{ki}^2)X_{ki}^2 = p(X^TX)_{ii}. $$

Finishing this off, we can note that $$ \|y - pXw\|^2 = y^Ty - 2pw^TX^Ty + p^2w^TX^TXw $$ and we've found $$ \E_R\|y - Mw\|^2 = y^Ty - 2pw^TX^Ty + w^T\E_R(M^TM)w \\ = \|y - pXw\|^2 - p^2w^TX^TXw + w^T\E_R(M^TM)w \\ = \|y - pXw\|^2 + w^T\left(\E_R(M^TM) - p^2 X^TX\right)w. $$ In $\E_R(M^TM) - p^2 X^TX$, I showed that every off-diagonal element is zero so the result is $$ \E_R(M^TM) - p^2 X^TX = p(1-p)\text{diag}(X^TX). $$ The paper defines $\Gamma = \text{diag}(X^TX)^{1/2}$ so $\|\Gamma w\|^2 = w^T\text{diag}(X^TX)w$ which means we are done.

Dropout in Linear Regression

1 Answers1