I can't be the first person to think about estimating variance components of mixed models by gradient descent, and then computing BLUP's at each update. Googling, I find little on the topic. But the gradients seem tractable (if messy). I suppose one would be logging the variances to render them continuous. I also suppose one would need to penalize the loss function, otherwise the variances would go to infinity.
Can anyone provide any further insight before I go and program it? References? Software? Tips/tricks for optimization? Would minibatch or stochastic gradient descent be problematic?