I'm having trouble understanding exactly how to obtain the regularization parameter when pruning a decision tree with the minimal cost complexity approach. Assume the cost complexity function is represented as
$$C(T) = R(T) + \alpha|T|,$$
where $\alpha$ is the regularization parameter to be chosen.
Utilizing the entire data set, We now use weakest link cutting to obtain a set of $\alpha$'s and the corresponding sub-trees which minimize the cost for a given $\alpha$.
Next, we generally use a K-fold cross-validation. This is where the pruning approach becomes unclear to me. For K-fold CV we estimate K trees. Next, I would think we would use the original $\alpha$'s which we obtain from the entire sample to identify the sequence of optimal sub-trees in each fold. We would then proceed with CV, selecting the $\alpha$ with corresponding smallest average error.
However, several sources (These lecture notes and Intro to Stats Learning Pg 309) seem to suggest that within each fold a new set of $\alpha$'s are obtained. Let's refer to the set of $\alpha$'s obtained within the kth fold as $\alpha^{(k)}$. This does not make sense to me. It is not likely that each entry within the set $\alpha$ (i.e. the set of $\alpha$'s obtained from the entire data set) will be equivalent to $\alpha^{(k)}$ of even that the elements of $\alpha^{(k)}$ will be equivalent to $\alpha^{(j)}$. How can we pick the entry of $\alpha$ that minimize cost when $\alpha^{(k)}$ potentially share no similar entries with $\alpha$?