In my view, there is a lot of silliness in various purported interpretations
of frequentist confidence intervals. One example of that is the
interpretation you quote, "[T]he population mean is not random variable so we can't say 95% probability that CI contains the population mean."
For simplicity, consider the 95% z confidence interval for normal $\mu,$ where $\sigma$ is known: $\bar X \pm 1.96\sigma/\sqrt{n}.$ This comes
from the perfectly reasonable statement
$$0.95 = P\left(-1.96 \le \frac{\bar X = \mu}{\sigma/\sqrt{n}} \le 1.96\right)\\
=P\left(\bar X - 1.95\frac{\sigma}{\sqrt{n}}\le \mu\le \bar X + 1.95\frac{\sigma}{\sqrt{n}}\right).$$
The sentence I quoted from your question ignores that $\bar X$ is a random variable. The 95% CI is a reasonable statement that the random interval contains (covers) the unknown $\mu$ with probability 95%. A frequentist interpretation of the probability of this 'coverage event' is that over the long run such an event
will be true 95% of the time.
It is unproductive sophistry to say that once we observe $\bar X,$ the 'probability collapses', so that the event is either true or false--no probability about it.
Traditionally, the compromise with hard-core
frequentists has been to call this a "confidence" interval, not a "probability" interval. So it is OK to say I have 95% "confidence in" the truth of the interval. (It is best not to try to define what "confidence" means.
You may soon get around to admitting it's just a diplomatic synonym of "probability".)
In the same sense, a frequentist would say that "$P(\mathrm{Heads}) = 1/2$"
for a fair coin means that over the long run the coin will show Heads nearly half of the time. Few people (even few hard-core frequentists) say it's meaningless to claim a coin is fair because,
if you ever toss it and look at the result, the 'probability collapses' and you either have a Head or a Tail--no probability about it.
Note: In a Bayesian setting normal $\mu$ and binomial $p$ are random variables. One begins with a (more or less informative) prior distribution, looks at data, and finds a posterior distribution on $\mu$ or $p.$ From
the posterior distribution, one can find a 95% Bayesian posterior probability interval for the parameter. However, details of that approach, which may have some philosophical difficulties of its own, are stories for another day.