The book chapter linked below (see section 4.3.1) lists a few formulations of partial least squares (PLS). The first two make sense to me and seem standard:
$$\underset{\mathbf{u}, \mathbf{v}}{\text{maximize}} \quad \frac{\mathbf{u}^\top \mathbf{X}^\top \mathbf{Y} \mathbf{v}}{\lVert \mathbf{u} \lVert \lVert \mathbf{v} \lVert} \quad \iff \quad \underset{\mathbf{u}, \mathbf{v}}{\text{maximize}} \quad \mathbf{u}^\top \mathbf{X}^\top \mathbf{Y} \mathbf{v} \quad \text{s.t.}~\lVert \mathbf{u} \lVert^2 = \lVert \mathbf{v}\lVert^2 = 1$$
They also state the problem is equivalent to minimizing the misfit:
$$\underset{\mathbf{u}, \mathbf{v}}{\text{minimize}} \quad \lVert \mathbf{X} \mathbf{u} - \mathbf{Y} \mathbf{v} \lVert^2 \quad \quad \text{s.t.}~\lVert \mathbf{u} \lVert^2 = \lVert \mathbf{v}\lVert^2 = 1$$
But this doesn't seem equivalent to me. Expanding the quadratic objective function we get:
$$\quad \quad \quad \quad \quad \mathbf{u}^\top \mathbf{X}^\top\mathbf{X} \mathbf{u} + \mathbf{v}^\top \mathbf{Y}^\top\mathbf{Y} \mathbf{v} - 2 \mathbf{u}^\top \mathbf{X}^\top \mathbf{Y} \mathbf{v} \quad \quad \quad \quad \quad (*)$$
It seems like they would like to ignore the first two terms so that all these optimization problems are equivalent, but I don't see how you can do that.
Reference in question: De Bie T., Cristianini N., Rosipal R. (2005) Eigenproblems in Pattern Recognition . In: Handbook of Geometric Computing. Springer, Berlin, Heidelberg
Side Note: I do understand a similar equivalence for the related case of canonical correlations analysis (CCA). In that model the constraints turn into $\mathbf{u}^\top \mathbf{X}^\top\mathbf{X} \mathbf{u} = \mathbf{v}^\top \mathbf{Y}^\top\mathbf{Y} \mathbf{v} = 1$, in which case the first two terms in $(*)$ are constrained to be constant.
A counterexample (?): Consider the following for a choice of $\epsilon$ close to zero.
$$ \mathbf{X} = \begin{bmatrix} 1/\epsilon & 0 \\ 0 & 1 \end{bmatrix} ~; \quad \mathbf{Y} = \begin{bmatrix} 1 & 0 \\ 0 & \epsilon \end{bmatrix}$$
$$ \mathbf{X}^\top \mathbf{X} = \begin{bmatrix} 1 / \epsilon^2 & 0 \\ 0 & 1 \end{bmatrix} \quad \mathbf{Y}^\top \mathbf{Y} = \begin{bmatrix} 1 & 0 \\ 0 & \epsilon \end{bmatrix} \quad \mathbf{X}^\top \mathbf{Y} = \begin{bmatrix} 1/\epsilon & 0 \\ 0 & \epsilon \end{bmatrix} $$
Which (in the first formulation) means we should be maximizing:
$$\mathbf{u}^\top \begin{bmatrix} 1 / \epsilon & 0 \\ 0 & \epsilon \end{bmatrix} \mathbf{v}$$
But in the second formulation means we should be minimizing:
$$ \mathbf{u}^\top \begin{bmatrix} 1 / \epsilon^2 & 0 \\ 0 & 1 \end{bmatrix} \mathbf{u} + \mathbf{v}^\top \begin{bmatrix} 1 & 0 \\ 0 & \epsilon \end{bmatrix} \mathbf{v} - \mathbf{u}^\top \begin{bmatrix} 2/\epsilon & 0 \\ 0 & 2\epsilon \end{bmatrix} \mathbf{v} $$
Now, as $\epsilon \rightarrow 0$, the former case would give $\mathbf{u} = \mathbf{v} = \begin{bmatrix} 1 & 0 \end{bmatrix}^\top$ while the latter case would give $\mathbf{u} = \mathbf{v} = \begin{bmatrix} 0 & 1 \end{bmatrix}^\top$ (since the $1 / \epsilon^2$ term dominates).