The paper, Random Fourier Features for Large-Scale Kernel Machines by Ali Rahimi and Ben Recht ,
makes use of Bochner's theorem which says that the Fourier transform $p(w) $ of shift-invariant kernels $k(x,y)$ is a probability distribution (in layman terms).
And therefore the kernel can be expressed as the inverse-Fourier transform of $p(w)$
$\begin{eqnarray} k(x,y) &=& \int_{R^d} p(w) e^{j w^T (x-y} dw \\ &=& \mathbb{E}_w[\psi_w(x) \psi_w(y)^*] \end{eqnarray}$
where,
$\psi_w(x) = e^{j w^T x}$, and $\psi_w(y)^* = e^{-j w^T y }$ is the complex conjugate
The statement the paper makes at this point is that since, $p(w)$ is real and even, the complex exponentials can be replaced with cosines, to give,
$k(x,y) = \mathbb{E}_w[z_w(x) z_w(y)]$
where $z_w(x) = \sqrt{2} cos(w^T x)$
I do not understand where this comes from.
From what I understand about Fourier Transforms, $p(w)$ is real and even for real and even $k(x,y)$.
Therefore, it should actually be,
$\begin{eqnarray} k(x, y) &=& \mathbb{E}_w[cos(w^T (x-y)] \\ &=& \mathbb{E}_w[cos(w^T x) cos(w^T y) + sin(w^T x) sin(w^T y)] \\ &=& \mathbb{E}_w[z_w(x)^T z_w(y)] \end{eqnarray}$
where $z_w(x) = [cos(w^T x), sin(w^T x)]^T$
What am I missing ?