I have a two dimensional dataset which comes from prior statistical analysis. Each point $(x_n,y_n)$ in the dataset has an error estimate $\sigma_{x,n}, \sigma_{y,n}$ for both coordinates. The $x_n$ are proportions and by definition in $(0,1)$ and the $y_n$ are growth rates, and tend to be in the range of $(-0.02,0.06)$. From both theoretical considerations and eye-balling the plot, it looks like there is a linear relationship between the coordinates.
I want to find the line $y = \hat{a}x + \hat{b}$ of best fit for this dataset, with estimates of uncertainty for both $\hat{a}$ and $\hat{b}$. What are the standard (or best?) ways to do this?
What I tried so far
Currently I am using least squares weighted by $\frac{1}{\sqrt{\sigma_{x,n}^2 + \sigma_{y,n}^2}}$.
In particular, with polyfit function from numpy, I do the following:
[a,b], [[a_v,ab_v],[_,b_v]] = \
np.polyfit(x_data,y_data,1,w=1/np.sqrt(y_err**2 + x_err**2),full=False,cov=True)
Where I interpret the square roots of a_v
and b_v
as my erors on $\hat{a}$ and $\hat{b}$. I've seen this sort of technique used before, but I understand that it can be very sensitive to outliers. Alternatively, I've considered getting error estimates through bootstrapping or jackknifing, this eliminates -- to some extent -- the need to get an error estimate from my fit, but it still forces me to choose some method of finding a line of best fit, and is computationally more intensive (especially if I start resampling on the raw data and not on the output of prior statistical analyses).