My understanding is that for determining weights / coefficients for N feature / variable Linear Regression model, that Gradient Descent is always used, not Gradient Ascent. This is due to minimizing cost as this is equivalent to minimizing sum of errors squared (RSS). Or have I missed the point?
It depends. You get to either minimize this:
$$ \sum_i \left( y_i - (\beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik})\right)^2 $$
for which you could use gradient descent, or you could maximize this
$$ - \sum_i \left( y_i - (\beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik})\right)^2 $$
for which you could use gradient ascent. Both problems are completely equivalent, the gradient of one is the negative of the gradient of the other.
In practice though, production software does neither of these things, unless the model is fit at a massive scale. Most of your data to day regressions are fit by solving a system of linear equations you get by setting the gradient to zero directly. Classical linear equation solvers are used for this, which work using matrix factorizations.
Moreover, if may fail if there are local minima.
One of the nice things about linear regression is that there is only one local extrema (which is, consequently, global). The only exceptional case is if your design matrix $X$ is not of full rank, in which case the solutions form an affine subspace.