I have a $[k \times N]$ matrix of predictors / independent variables and a $[k \times N]$ matrix of predictands / dependent variables. I have uncertainty estimates for each predictor and each predictand. I use orthogonal distance regression (ODR, with scipys Python interface to ODRPACK) to estimate my predictands one column at the time (regressing $[k \times N]$ as independent and $[1 \times N]$ as dependent), but my estimate of β (estimated parameter values) suffers from strong collinearities between the input vectors, i.e.
$y_8 = 1.3 - 0.046x₁ + 1.45x₂ - 1.49x₃ + 2.2x₃ - 1.4x₄ + \ldots + 1.1x₈ + \ldots$
...which in turn leads to my model being poorly generalised to independent testing data. For physical reasons¹, I would expect that all but one factors of β are close to zero, and my initial guess for β is
$$ \beta_k = \cases{% 0 \quad k \neq i\\ 1 \quad k = i} $$
and I would expect the estimated $\beta$ to be close to it.
In practice, $k=12$ and $N$ is in the order of $10^3-10^5$.
How can I mitigate the problems caused by strong collinearities between the input vectors? Can I penalise large deviations from expected coefficients or otherwise regularise input data? Alternately, can I incorporate my errors-in-variables into methods better suited for collinear predictors, such as PLSR? I am looking for a practical solution that is reasonably straightforward to implement using Python and available libraries.
¹ Physically, I am comparing two N-channel radiometers with small, poorly known differences between corresponding channels, leading to small but nonzero expected differences in radiance. My regression aims to get a better understanding of the differences. Uncertainty estimates are essential to the project.