You might be better off fitting a generalized linear model instead of a "plain" linear model, and analyzing the residuals of the GLM instead. This procedure and a few good reasons for doing so are laid out in this answer. GLMs have more than one kind of residual, but there is a large literature on analyzing them.
In case you balk at the idea of switching from OLS to ML, or you're hesitant to impose distributional assumptions on the response, consider that regression with OLS is equivalent to a GLM that assumes a normally distributed response and the identity link function.
Moreover, regression models (generalized or not) describe a conditional mean, but making predictions and then un-transforming the predictions does not in general produce a conditional mean for the un-transformed response. In your case, $E(\sqrt{y}) \neq \sqrt{E(y)}$.
(edit/update) Consider a response $y$ and its transformation $y'=\sqrt{y}$. You fit the regression model
$$y'=\beta_0 + \beta x + \varepsilon$$
which, if $\operatorname{E}(\varepsilon|x)=0$ (as we assume for OLS), is equivalent to the model
$$\operatorname{E}(y'|x) = \operatorname{E}(\sqrt{y}|x) = \beta_0 + \beta x$$
The problem is that $\left(\operatorname{E}(\sqrt{y}|x)\right)^2 \neq \operatorname{E}(y|x)$ in general. Fortunately, in this particular case we can move forward without making any additional assumptions by appealing to the formula $\operatorname{V}(Z) = \operatorname{E}(Z^2) - \left(\operatorname{E}(Z)\right)^2 \implies \operatorname{E}(Z^2) = \operatorname{V}(Z) + \left(\operatorname{E}(Z)\right)^2$, so that
$$\operatorname{E}(y|x) = \operatorname{V}(\sqrt{y}|x) + \left(\operatorname{E}(\sqrt{y}|x)\right)^2$$
and therefore
$$ \widehat{y} = \widehat{\sigma^2} + \left(\widehat{y'}\right)^2 $$
In general, however, you will need to make some more assumptions. If you assume that $(y|x) \sim Normal(\beta_0 + \beta x, \sigma^2)$, which is implicit in OLS, you can usually derive the transformation by applying the Jacobian to the Gaussian PDF and taking its expectation. With a log-transformed response, for instance, the original-scale response variable follows a log-normal distribution, so the correct back-transformation would be $\widehat{y} = e^{\widehat{y'} + \frac{\widehat{\sigma^2}}{2}}$. This particular (and very common) case is demonstrated nicely on David Giles' blog.