Some background information first:
Given a dependent variable $y_t$, independent variables $X_t$ and a conditional mean model
$$y_t=\beta X_t+\epsilon_t$$
you can use a GARCH model to model the conditional variance of $\epsilon_t$.
Say you have fit a GARCH model and obtained fitted conditional standard deviations $\hat \sigma_t$. If you scale the residuals $\hat \epsilon_t$ by the inverse of the fitted conditional standard deviations $\hat \sigma_t$, you obtain scaled residuals $\hat u_t:=\frac{\hat \epsilon_t}{\hat \sigma_t}$. You would like these to be "nice". At least they should have no ARCH patterns remaining in them. This can be tested by the Li-Mak test, for example.
1: regarding nonstationary residuals
GARCH model does not produce any residuals -- there is no GARCH-model-residual in the GARCH formula (only lagged errors $\epsilon_t$ from the conditional mean model that are used as regressors in the GARCH model).
But what exactly do you mean by nonstationarity: unit root?; heteroskedasticity?; level shift?
When you mention nonstationary residuals, do you have in mind $\hat u_t$ or $\hat \epsilon_t$, or still something else?
Edit: the type of nonstationarity is unit root. I suspect this is due to a poor model for the conditional mean rather than a failure of GARCH. Since the effect of GARCH on $\hat u_t$ is the scaling of $\hat \epsilon_t$ by $\frac{1}{\hat \sigma_t}$, that only changes the scale of $\hat \epsilon_t$ but cannot introduce a unit root. That is, the unit root must have already been a feature of $\hat \epsilon_t$, and that is a problem of the conditional mean model, not the conditional variance model.
2: regarding heteroskedasticity
More could be said when you clarify what residuals you have in mind.
Edit: residuals in mind are $\hat u_t$. If $\hat u_t$ are conditionally heteroskedastic but the pattern is not of ARCH nature, then you could append the standard GARCH model by explanatory variables to explain the remaining heteroskedasticity.
3: regarding non-normality
$\epsilon_t$ can be non-normal, this is no problem. $u_t$ should match the distribution you assume when fitting a GARCH model (you need to assume a distribution to be able to obtain the likelihood function that will be maximized when fitting the GARCH model). If you assume a normal distribution for $u_t$ but can reject normality for $\hat u_t$ then it's a problem. But you do not need to assume normality. A $t$ distribution with 3 or 4 degrees of freedom has been argued to be more relevant than a normal distribution for financial returns, for example.
4: regarding residuals are often non-stationary, heteroskedastic and not normal, so the model doesn't explain volatility
Eidt (more precise formulation): I am not sure I follow the logical connection here. Since GARCH aims to explain a specific type of conditional heteroskedasticity (not any and all types of CH but autoregressive CH), you should assess it on that basis. If $\hat \epsilon_t$ are autoregressively conditionally heteroskedastic (this can be tested by the ARCH-LM test) but $\hat u_t$ are conditionally homoskedastic (as tested by the Li-Mak test), the GARCH model has done its job.
My experience with GARCH models (admittedly limited) is that they do their job but of course are not a panacea.