The formula you quote from your notes is not exactly AIC.
AIC is $-2\log\mathcal{L}+2k$.
Here I'll give an outline of an approximate derivation that makes clear enough what's going on.
If you have a model with independent normal errors with constant variance,
$$\mathcal{L}\propto \sigma^{-n} \: e^{-\frac{1}{2\sigma^2}\sum \varepsilon_i^2}$$
which can be estimated under maximum likelihood as
\begin{eqnarray}
& \propto &(\hat{\sigma}^2)^{-n/2} e^{-\frac12 n\hat{\sigma}^2/\hat{\sigma}^2}\\
& \propto &(\hat{\sigma}^2)^{-n/2} e^{-\frac12 n}\\
& \propto &(\hat{\sigma}^2)^{-n/2}
\end{eqnarray}
(assuming the estimate of $\sigma^2$ is the ML estimate)
So $-2\log\mathcal{L} +2k = n\log{\hat{\sigma}^2} + 2k$ (up to shifting by a constant)
Now in the ARMA model, if $T$ is really large compared to $p$ and $q$, then the likelihood can be approximated by a such a Gaussian framework (e.g. you can write the ARMA approximately as a longer AR and condition on enough terms to write that AR as a regression model), so with $T$ in place of $n$:
$AIC \approx T\log{\hat{\sigma}^2} + 2k$
hence
$AIC/T \approx \log{\hat{\sigma}^2} + 2k/T$
Now if you're simply comparing AICs, that division through by $T$ doesn't matter at all, since it doesn't change the ordering of AIC values.
However, if you're using AIC for some other purpose that relies on the actual value of differences in AIC (such as to do multimodel inference as described by Burnham and Anderson), then it matters.
Numerous econometrics texts seem to use this AIC/T form. Oddly, some books seem to reference Hurvich and Tsai 1989 or Findley 1985 for that form, but Hurvich & Tsai and Findley seem to be discussing the original form (though I only have an indirect indication of what Findley does right now, so perhaps there is something in Findley on it).
Such scaling might be done for a variety of reasons -- for example, time series, especially high frequency time series, can be very long and ordinary AICs might have a tendency to become unwieldy, especially if $\sigma^2$ is very small. (There are some other possible reasons, but since I really don't know the reason this was done I won't start going down a list of all possible reasons.)
You may like to look at Rob Hyndman's list of Facts and fallacies of the AIC, - particularly items 3 to 7. Some of those points might lead you to be at least a little cautious about relying too heavily on the approximation by Gaussian likelihood, but maybe there's a better justification than I offer here.
I'm not sure there's a good reason to use this approximation to the log-likelihood rather than the actual AIC since a lot of time series packages these days tend to calculate (/maximize) the actual log-likelihood for ARMA models. There seems little reason not to use it.