In short: I am currently reading Online Learning with Kernels (http://books.nips.cc/papers/files/nips14/AA33.pdf) for fun and I can't figure out how he got to equation 8 from equations 6 and 7.
The idea is: We want to minimize a risk function $R_{stoch}[f,t]:=c(x_t,y_t,f(x_t))+\lambda\Omega[f]$. If we want apply the representer theorem on f, writing it as $f(x)=\sum\alpha_i k(x,x_i)$, how can we get to the STOCHASTIC gradient descent update? Say we take the soft margin loss for SVMs. It would be easy to take the gradient w.r.t. to f and loss (well sub-gradient for loss) and do gradient descent. But for online learning with stochastic gradient descent, I'm kinda lost.
Thank you! Please do not hesitate to ask further details. Any help would be greatly appreciated.