There is no equation in closed form because of the computations involved.
This web page has relevant theory and formulas in Sect. 2.2 beginning on p143. I will try to show
one example to illustrate the computations involved.
Suppose you are doing a one-sided, pooled 2-sample t test at significance level $\alpha = 0.05.$ Your estimate of the common standard deviation is $\sigma = 4.$
Then, the crucial quantities are the size $\delta$ of the effect, the
number $n$ of observations in each sample, and the power $\pi$ of the test
against the difference of $\delta.$ In principle, if you specify any two
of $\delta, n,$ and $\pi,$ then the third can be obtained. To begin, suppose
$\delta = 5, n = 10,$ and we seek $\pi.$
The critical value $c$ of the test is determined so that $c$ cuts
probability $\alpha$ from the upper tail of Student's t distribution with
degrees of freedom $\nu = 2n - 2.$ That is, under $H_0$ the pooled $T$ statistic will lead to rejection if $T > c.$
In particular, for the specific values
mentioned above, we can find $c = 1.734,$ using R statistical software as follows:
qt(.95, 18)
[1] 1.734064
In order to find the power $\pi,$ we need to use the non-central t distribution
with noncentrality parameter $\lambda = \frac{\delta}{\sigma\sqrt{2/n}}.$
According to this noncentral t distribution, and assuming the alternative hypothesis to be true, we want the probability $P(T \ge c) = 0.851.$
(See Wikipedia for some technical details of the noncentral t distribution.)
n = 10; df = 2*n - 2; cv = qt(.95, df); cv
[1] 1.734064
sg = 4; dlt = 5; lam = dlt/(sg*sqrt(2/n)); lam
[1] 2.795085
pwr = 1 - pt(cv, df, lam); pwr
[1] 0.8514775
Many statistical software programs have procedures for power and sample size.
The following power curve for the values we used above is from Minitab. The value computed above using R is shown as a dot on the curve. Minitab's result matches our computation.

If you want to specify $\delta$ and $\pi,$ then many of these programs will search for $n$ just large enough to give the requested power. The most efficient design for a two-sample test is to have the sample sizes equal, and so most
programs give one value of $n$ for each sample.
If you want to do a Welch 2-sample test, then you have to specify the two standard deviations, used in a slightly revised formula for $\lambda$ (A formula on p144 of the link above shows how to handle a Welch test with $n_1 \ne n_2.$
There, $T_\nu(\cdot)$ represents the CDF of a t distribution and $T_\nu(\cdot | \lambda)$ the CDF of a noncentral t distribution.)
Power computations for two-sided tests are similar, but there are two terms to compute (one for each tail); often one of the two terms is so small it can be ignored for practical purposes.