I am trying to figure out how many significant figures I should report after doing a linear regression in Excel.
I have a dataset of 740 entries.
I use 75% of them as a training set for the regression and 25% as the testing set. Each set is determined randomly.
I do a first regression I come up with some values for my coefficients and when I use them I get a pretty good match with my test data. The average residual is of 0.14
However, if I re-do the regression with another random training set, I get slightly different coefficients.
I therefore round up my coefficient to the closest common digit between the two replicates, and use those rounded coefficient to predict the output of my test set. I now get a mean residual of 0.5, which is almost 40 bigger than when I used the non-rounded coefficients from my first replicate.
As you can see from the graph, when I use the unrounded coefficients from each replicate I get much better results than when I use the coefficient rounded to the closest matching digit between each replicate.
I feel like I should report the rounded value as the results are likely to be the same no matter what portion of the dataset is used as training set. Yet, I find it confusing that someone using those rounded value would get a worse match...
Any recommendations ?