A way to gauge, how useful a predictor $x_j$ is within a given model $M$ is by comparing the performance of the model $M$ with and without a predictor $x_j$ being included (say model $M^{-x_j}$). If we have multiple predictors though we are face with a situation we would have to create $p$ different $M^{-x_j}$ models going back and forth. The cost of this re-training procedure quickly becomes prohibitively high.
The point of permuting a predictor is to approximate the situation where we use the model $M$ to do a prediction but we do not have the information for $x_j$. Scrambling should destroy all (ordering) information in $x_j$ so we will land in situation where $x_j$ is artificially corrupted. We can then compare the performance of our model $M$ when using the pristine estimator $x_j$ and the performance of model $M$ when using the scrambled version; this allows to approximate what would happen if we had little to no information about $x_j$ without having to retrain a model $M^{-x_j}$.
So to recap and answer your questions above:
- Scrambling, corrupts the information of a predictor $x_j$ and thus allows us to treat this as if $x_j$ information is missing.
- Trees (the archetypical base learners for random forests) are strongly reliant to the ordering induced by an explanatory variable $x_j$ when making a prediction. By permuting $x_j$ we feed no (or out-right wrong) information about $x_j$ in our random forest model $M$ when making predictions so we should see a knock on performance. If we saw no performance difference it would be strongly indicative that $x_j$ is not really used.
- It is an approximation of variable importance. The mental rule-of-thumb reasoning is that "the more important a variable is the more impactful should be in the model performance". Of course this is an working assumption; there are a number of little things that can go wrong (see last discussion below) but it is not unfounded.
Notice that permutation importance does break down in situations that we have correlated predictors and give spurious results (e.g. see the Nicodemus et al. (2010) The behaviour of random forest permutation-based variable importance measures under predictor correlation for a more in depth discussion.) I would suggest not relying on a single variable importance performance metric. For example, we can easily compute importance based on the relative gains and on the number of variable is used for splits as well as look at SHAP-based variables importances. This can give us a more holistic view. To paraphrase a great one: "all importance metrics are wrong but some are useful". A more recent exposition can be found in Please Stop Permuting Features: An Explanation and Alternatives (2019) by Hooker and Mentch (but it is not yet formally peer-reviewed).