We have to be clearer on what it means to "estimat[e] $y_i = \exp(\beta_0 + \beta_1 x_i)$". Although that looks like a completely formed thought, it isn't quite. What's missing is the specification of the response distribution and/or error term. (Since you are using the log link, I'll assume you are using the Poisson GLiM as your example, since the log is the canonical link for the Poisson GLiM.) Now we have:
$$
\log(\lambda_i) = \beta_0 + \beta_1 x_i
$$
Thus, we can find estimated betas by finding the values of the betas that will maximize the likelihood:
$$
L_i = \prod_{i = 1}^N \frac{\lambda_i^{y_i}}{y_i!}e^{-\lambda_i}
$$
(or that minimize the deviance, $-2\times\ln(L_i)$, which is preferred for computational reasons, but yields the same estimated betas).
Since this is confusing, let's walk through it slowly.
Are you transforming the estimated $μ_i$?
Yes... or, sort of. We estimate $\hat\mu_i$ by back transforming the RHS of the equation.
Is $μ_i$ estimated based on: $μ_i=\exp(β_0+β_1X_i)$?
Yes.
...then [isn't] $μ_i=y_i$ because we haven't changed its value in the dataset at all[?]
No. The way this works is that we plug in some candidate values for the betas, run them through the RHS of the equation, and then exponentiate the result. That is the predicted mean of the data at that point in the covariate space (i.e., when $X=x_i$) given the stipulated candidate betas. For the Poisson distribution, the mean is $\lambda$, which is the parameter that governs the behavior of the distribution—once you know that, you know everything you need to know about that particular (conditional) Poisson distribution. For example, you can determine the relative likelihood of any observed datum $y_i$. So we are not assuming $y_i = \mu_i$; we use $\hat\mu_i = \hat\lambda_i$ to make it possible to determine $L(y_i|{\bf X}, \boldsymbol{\hat\beta})$.
Now, none of that implies that any given set of candidate beta values that we had stipulated are the best ones. We will have to search. But as we search, at each point we are now able to evaluate the fit / the likelihood of the stipulated beta values given the data.
For more on these topics, it may help you to read my answer here: Difference between logit and probit models.