The answer to both questions is yes:
- yes, LOO does have a pessimistic bias, and
- yes, the described effect of additional pessimistic bias is well known.
Richard Hardy's answer gives a good explanation of the well-known slight pessimistic bias of a correctly performed resampling validation (including all flavors of cross validation).
However, the mechanism discussed in the body of the question, namely that removing a case that is in some sense extreme will give a test/training subset split where the training subset is particularly un-representative of the subset to be tested. This can cause additional error as Sammy explained already. So the reason for this high error is that predictive performance deteriorates extremely fast for cases just outside (or at the edge) of training space.
What to do against this effect?
There are different points of view on such a situation, and it will depend on your judgment of the task at hand which one applies and what to do about this.
- On the one hand, this may be seen as an indication of the error to be expected for application cases similarly extreme (somewhat outside training space) - and encountering such cases during resampling can be seen as indication that similarly extreme cases for the model built on the whole data set will be encountered during production use.
From this point of view, the additional error is not a bias but an evaluation including slight extrapolation outside training space which is judged to be representative of production use.
- On the other hand, it is perfectly valid to set up a model under the additional constraint/requirement/assumption that no prediction should be done outside training space. Such a model should ideally reject prediction of cases outside its training domain, LOO error for predicted test cases of such a model would not be worse, but would encounter a lot of rejects.
Now, one can argue that the mechanism of leave one out produces an unrepresentatively high proportion of outside training space cases due to the described opposite influence on training and test subset populations. This can be shown by studying the bias and variance properties of various $k$ or $n$ for leave-$n$-out and $k$-fold cross valiation, respectively. Doing this, there are situations (data set + model combinations) where leave one out exhibits a larger pessimistic bias that would be expected from leave-more-than-one-out. (see Kohavi paper linked by Sammy; there are also other papers reporting such a behaviour)
I may add that as leave-one-out has other undesirable properties (conflating model stability wrt. training cases with random error of tested cases), I'd anyways recommend against using LOO whenever feasible.
Stratified variants of resampling validation produce by design more closely matching training and test subpopultation, they are available for classification as well as regression.
Whether it is appropriate or not to employ such a stratification is basically a matter of judgment about the task at hand.
However, leave one out differs from other resampling validation schemes in that it does not allow stratification. So if stratification should be employed, leave one out is not an appropriate validation scheme.
When does this particular pessimistic bias occur?
- This is a small sample size problem: in the described model, as soon as there are sufficient cases in each weekday "bin" so that even leaving out an extreme case leads to a fluctuation of the training mean that is << the spread of temperatures for that weekday, the effect on observed error is negligible.
- High dimensional input/feature/training space has more "possibilities" for a case to be extreme in some direction: In high dimensional spaces, most points tend to be at the "outside". This is related to the curse of dimensionality.
- It is also related to model complexity in the sense that high error for edge cases is an indication that the model is unstable right away outside the training region.