Welcome to cross validated!
Approach 1
Have a look at chapters 7.10 and 7.11 of The Elements of Statistical Learning.
I think the basic idea is to calculate the uncertainty on the test results for the different numbers of latent variables. That gives you an idea which differences you cannot trust to be real differences.
Do not forget that choosing the number of latent variables from test results is a data-driven model optimization, so you need an outer validation loop to measure the predictive performance of the model you obtain that way.
I'd also suggest to switch from LOO-cross validation to iterated/repeated $k$-fold cross validation or some version of out-of-bootstrap validation (see the book and answers here on cross validated to that topic).
You can also directly bootstrap the RMSE = f (# latent variables) plot.
Approach 2
Here's a second approach, that works very well for certain types of data: I work with spectroscopic data. Good spectra have a high correlation between neighbouring measurement channels, they look smooth in a parallel coordinate plot. For such data, I look at the X loadings. Similar to PCA loadings, higher PLS X loadings are usually more noisy than the first ones. So I decide the number of latent variables by looking how noisy the loadings are. For the data I deal with, this usually leads to far fewer latent variables than RMSECV (at least without calculating uncertainty) suggests.
Rule of Thumb
A rule of thumb I learned when I was first developing PLS models for industry as a student is: decide a number of PLS latent variables the way you learnt in lectures (e.g. with RMSE without uncertainty). Use at most 2 or 3 latent variables less than that would suggest.
My experience is that this rule of thumb did not only work for the UV/Vis data I had there, but also for other spectroscopic techniques.
Also, I find it very helpful to sit down and think about the application: what influencing factors do you expect and to how many components would that correspond. Again, this is not applicable to all kinds of problems and applications, but if you can take this approach it should give a reasonable starting point.
edit: references for approach 2
I know papers where we did it that way (for PCA, not PLS though), but IIRC we never showed the chosen loadings plus some noisy loadings we didn't choose, and we did not really discuss the criterium in detail. However:
- Dochow, S. and Beleites, C. and Henkel, T. and Mayer, G. and Albert, J. and Clement, J. and Krafft, C. and Popp, J. Quartz microfluidic chip for tumour cell identification by Raman spectroscopy in combination with optical traps. Anal Bioanal Chem, 2013, 405, 2743-2746
[A] principal component analysis (PCA) model was calculated for the 21 background spectra and the first four principal components (without centring) were used to model these contributions. [...] Two further principal components did not have enough signal-to-noise-ratio to warrant inclusion into the model.