I'm grappling with this question myself right now. Here's a result that may be helpful. Consider the linear model
$$y = X\beta + \epsilon, \quad \epsilon \sim N(0,\sigma^2)$$
where $y \in \mathbb{R}^n, \beta \in \mathbb{R}^p,$ and $\beta$ and $\sigma^2$ are the parameters of interest. The joint likelihood is
$$L(\beta,\sigma^2) = (2 \pi \sigma^2)^{-n/2} exp\left(-\frac{||y-X\beta||^2}{2\sigma^2}\right)$$
Optimizing the joint likelihood yields
$$\hat{\beta} = X^+ y$$
$$\hat{\sigma}^2 = \frac{1}{n}||r||^2$$
where $X^+$ is the pseudoinverse of $X$ and $r=y-X\hat{\beta}$ is the fit residual vector. Note that in $\hat{\sigma}^2$ we have $1/n$ instead of the familiar degrees-of-freedom corrected ratio $1/(n-p)$. This estimator is known to be biased in the finite-sample case.
Now suppose instead of optimizing over both $\beta$ and $\sigma^2$, we integrate $\beta$ out and estimate $\sigma^2$ from the resulting integrated likelihood:
$$\hat{\sigma}^2 = \text{max}_{\sigma^2} \int_{\mathbb{R}^p} L(\beta,\sigma^2) d\beta$$
Using elementary linear algebra and the Gaussian integral formula, you can show that
$$\hat{\sigma}^2 = \frac{1}{n-p} ||r||^2$$
This has the degrees-of-freedom correction which makes it unbiased and generally favored over the joint ML estimate.
From this result one might ask if there is something inherently advantageous about the integrated likelihood, but I do not know of any general results that answer that question. The consensus seems to be that integrated ML is better at accounting for uncertainty in most estimation problems. In particular, if you are estimating a quantity that depends on other parameter estimates (even implicitly), then integrating over the other parameters will better account for their uncertainties.