I came across an example in Casella and Berger's Statistical Inference (Example 10.2.1) where we have a random variable, $X$, that $(1-\delta)\times100$% of the time is equal to a $N(\mu, \sigma^{2})$ random variable, and the remaining $\delta \times 100$% of the time, $X$ is equal to a random variable with unknown distribution but whose mean is $\theta$ and variance is $\tau^{2}$.
The book claims:
$Var(X) = (1-\delta)\sigma^{2} + \delta \tau^{2} + \delta(1-\delta)(\theta - \mu)^{2}$
I'm having trouble deriving this same formula. My logic going into this is $Var(X) = E[X^{2}] - (E[X])^{2}$, where $E[X] = (1-\delta)\mu + \delta \theta$.
$E[X^{2}]$, I think, is just a weighted sum of the second moments of the two component random variables, i.e. $(1-\delta)(\sigma^{2} + \mu^{2}) + \delta(\tau^{2} + \theta^{2})$.
But when I compute $E[X^{2}] - (E[X])^{2}$, I don't get the simple expression in the book. Is my approach incorrect?
Moreover, how is this different than a traditional linear combination of two random variables? I don't think the variance expression in the book is derived from the traditional expression for the variance of a linear combination of two random variables, i.e. $Var(X) = (1-\delta)^{2}\sigma^{2} + \delta^{2}\tau^{2} + 2Cov(\text{component}_{1}, \text{component}_{2})$.