I will only answer your first question. I will show that the "finite sample adjustment" is not really an adjustment and that the Ljung-Box statistic is only natural (more so than the Box-Pierce statistic).
(For the second and third questions, you could consult Anderson (1942), which is unfortunately quite technical. Probably another user will offer a more intuitive answer.)
Take an ARMA($p$,$q$) model
$$ \phi(B) w_t = \theta(B) a_t $$
where $B$ is the backshift (or lag) operator. Define the $k$-th order autocorrelation of model errors (not residuals) as
$$ r_k := \frac{ \sum_{t=k+1}^n a_t a_{t-k} }{ \sum_{t=1}^n a_t^2 } $$
and collect the first $m$ autocorrelations in one vector $r:=(r_1,\dotsc,r_m)$.
Box & Pierce (1970) claim on p. 1510 that for large $n$,
- $r$ has a multivariate normal distribution,
- $r_i$ and $r_j$ are uncorrelated for $i \neq j$ and
- the variance of $r_k$ is
$$ \text{Var}(r_k) = \frac{n-k}{n(n+2)}. $$
Then it follows that the sum
$$ \sum_{k=1}^m \frac{n(n+2)}{n-k} \text{Var}(r_k) = n(n+2) \sum_{k=1}^m \frac{1}{n-k} \text{Var}(r_k) $$
is distributed as $\chi_m^2$ for large $n$ (because you get $\chi^2_m$ distribution by summing up $m$ squares of independent standard normal random variables).
Up to this point we have an expression of the Ljung-Box (rather than Box-Pierce) test statistic. So apparently there is no "finite sample correction".
What happens next is that Box & Pierce (1970) note that
$$ \text{Var}(r_k) \approx \frac{1}{n} $$
since $\frac{n+2}{n-k} \approx 1$ for large $n$, and then also
$$ n \sum_{k=1}^m \text{Var}(r_k) \sim \chi_m^2. $$
Here is where the Box-Pierce statistic (different from the exact statistic above) is introduced.
This concerns the case where model errors are known, which is not what we encounter in practice. Therefore, Box & Pierce (1970) go on to examine the case with estimated residuals in place of the true model errors.
After some elaboration on the pure autoregressive AR($p$) case, they note on p. 1517 that when errors are unknown and are replaced by residuals, for large $n$ it is sufficient to replace $m$ with $m-p$ ($=m-p-q$ since $q=0$) in the asymptotic distribution and the result will still hold:
$$ n \sum_{k=1}^m \text{Var}(\hat r_k) \sim \chi_{m-p}^2 $$
where $\hat r_k$ is the sample counterpart of $r_k$.
Further they show that the case of ARMA($p$,$q$) in place of pure AR($p$) does not change the essence, and so for a general ARMA($p$,$q$) model one still has that
$$ n \sum_{k=1}^m \text{Var}(\hat r_k) \sim \chi_{m-p-q}^2. $$
In these last few expressions, the approximation $\frac{n+2}{n-k} \approx 1$ is used. It does not hurt in large samples, but apparently it causes trouble in small samples, which Ljung & Box (1987) note (citing a few studies). Therefore, they suggest dropping the approximation and going back to the original statistic.
References: