Suppose I have predicted values obtained from several methods like KNN, maximum likelihood estimator, $k$-means clustering etc, say $x_1, x_2, \ldots, x_n$ with column vector $x_i$ predicted from method i. I want to combine all these results using least squares, i.e., $Xb=y$, where $X=[x_1,x_2,\ldots,x_n]$ and vector $y$ stores known values. So I have the predicted values obtained by this least square regression, say $x^*$. Of course, I have a training set to tune the parameters of all methods and get the coefficients of least squares. I wonder whether this combination can have a chance to outperform the best single method. That is, for a testing set, I use all single method to predict and the result obtained from the best single method is $x_{tj}$, then perform least squares I get combined result $x_t^*$, can I sure that $\sum(x_{tjk}-x_{tk})^2 \le \sum(x_{tk}^*-x_{tk})^2$, where $x_t$ is a known vector?
I have tested this for tens of thousands times using random sampling from my real data with different number of data points, only about $2\%$ of results show that the errors between combined predicted results and known values are smaller then that between the best predicted results and known values. So it seems that combine the results obtained by all single methods in this way can not help to improve the prediction precision. But how can I prove this?
How about if I set constraints on least square coefficients as $b_i>0$ and $\sum(b_i)=1$?