According to Hamedani, G. G. (1992). Bivariate and multivariate normal characterizations: a brief survey. Communications in Statistics-Theory and Methods, 21(9), 2665-2688), the oldest characterization of the bivariate normal distribution is due to Cramer (1941). It states (using the OP's notation)
The random vector $[Y, A]$ has a bivariate normal distribution if
and only if every linear combination of $Y$ and $A$ has a univariate
normal distribution.
Denote
$$\mathbf z = (z_1, z_2)' \in \mathbb R^2$$
and consider
$$W(\mathbf z) = [Y, A]\cdot \mathbf z = Yz_1 + Az_2 = (A+B)z_1 + Az_2 = (z_1+z_2)A +z_1B$$
By assumption, $A$ and $B$ are normal random variables. Scaling does not affect normality. Therefore, $(z_1+z_2)A$ is a normal random variable, and so is $z_1B$. Moreover, $A$ and $B$ are assumed independent, and therefore so are these scaled versions of them. The sum of two independent normal random variables is a normal random variable. So $W(\mathbf z)$ is a normal random variable, for any $\mathbf z$ (even for $\mathbf z = \mathbf 0$, see comments). But this means that every linear combination of $Y$ and $A$ has a univariate normal distribution, so Cramer's condition is satisfied and $[Y,A]$ has a bivariate normal distribution.
So the joint density will be the bivariate normal density, and the only thing one needs to calculate is the correlation coefficient between $Y$ and $A$, which is trivial.