Recently I found a nice slideshow that explains PLS and the idea behind it pretty well. I think I understand the majority of the slides but I'm a bit confused with the first step of the NIPALS algorithm. Here the author the slides describes the choice in w as the unit vector that maximizes cov(Xw, Y).
My question is how does cov(Xw, Y) = w'X'Y? Or how does the other reason that we can just maximize w'X'Y? I understand the rest of this first step but I'm a bit confused by this one line.
Thank you!!