Yes, if you're drawing one $x_{T+1}$ for every sample joint posterior sample $x_{1:T},\theta$.
To be more clear, you are correct if you do the following steps for $i=1,2,\ldots$:
-1. Draw $\theta[i], x_{1:T}[i] \sim p(\theta, x_{1:T} \mid y_{1:T})$ (probably with a Gibbs sampler), then
-2. Draw $x_{T+1}[i] \sim N(A[i]x[i]_{T},Q[i])$ (from the model's Markov transition)
-3. (a) Be ready to evaluate the function (in $y_{T+1}$) $\frac{1}{N}\sum_{i=1}^N N(y_{T+1}\mid B[i]x_{T+1}[i],\Sigma[i])\\$ OR
-3. (b) Draw $y_{T+1} \sim N(Bx_{T+1},\Sigma)$.
Using 3b instead of 3a, ignoring everything except the collection of $y_{T+1}$s in your samples, is giving you a sample from the prediction distribution you want. You can make a histogram, take an average of them, whatever you want. If this is your goal, read below for how to do this in a better way by using Rao-Blackwellization/marginalization.
However, the last equation in your question appears to be going with 3a instead of 3b. You aren't sampling future $y_{t+1}$s; rather, you are picking a specific $y_{T+1}$ and estimating the height of the density with Monte Carlo.
If you want to approximate moments of this distribution using this same Monte Carlo approach, you invoke the law of total expectation or law of total variance, and use Monte Carlo on each raw/uncentered moment. Right now you're just approximating the density pointwise, albeit with common random numbers.
Also notice that this is difficult in real time, because obtaining samples from the joint posterior in step (1) is slow, and gets slower with time.
Edit:
Seems like I should also mention Rao-Blackwellization/marginalization here, too. Say you were interested in the mean of $p(y_{T+1} \mid y_{1:T})$, and not points on the density. Then you could use the fact that
$$
E[y_{T+1} \mid y_{1:T}] = E\left[ E\left( y_{T+1} \mid X_T,\theta \right) \mid y_{1:T} \right].
$$
This would go a long way to reducing variance, which translates into faster computation because you wouldn't need to sample for as many iterations.
In your particular case, the inner expectation is
$$
E\left( y_{T+1} \mid X_T,\theta \right) = E\left( B x_{T+1} \mid X_T, \theta \right) = B A X_T.
$$
This is useful when you plug this back into the iterated expectation. You don't have to sample as many states in time, and that means you have less variance:
$$
E[y_{T+1} \mid y_{1:T}] = E\left[ B A X_T \mid y_{1:T} \right] \approx \frac{1}{N}\sum_{i=1}^N B[i] A[i] X_T[i],
$$
where $B[i] A[i] X_T[i] \sim p(\theta, X_{T} \mid y_{1:T})$. These come from sampling the full joint posterior $p(\theta, X_{1:T} \mid y_{1:T})$ and then ignoring the $x_{1:T-1}$ portion of each of your $N$ samples.