Firstly, least squares (or sum of squared errors) is a possible loss function to use to fit your coefficients. There's nothing technically wrong about it.
However there are number of reasons why MLE is a more attractive option. In addition to those in the comments, here are two more:
Computational efficiency
Because the likelihood function of a logistic regression model is a member of the exponential family, we can use Fisher's Scoring algorithm to efficiently solve for $\beta$. In my experience, this algorithm converges in only a few steps. To solve least squares numerically will likely take longer.
Lest this gets lost, per @vbox's comment:
learning parameters for any machine learning model (such as logistic regression) is much easier if the cost function is convex. And, it's not too difficult to show that, for logistic regression, the cost function for the sum of squared errors is not convex, while the cost function for the log-likelihood is.
MLE has very nice properties
Solutions using MLE have nice properties such:
- consistency: meaning that with more data, our estimate of $\beta$ gets closer to the true value.
- asymptotic normality: meaning that with more data, our estimate of $\beta$ is approximately normal distributed with variance that decreases with $O(\frac{1}{n})$
- functional invariance: nice property to have when dealing with multiple parameters (nuisance parameters) and calculating the profile likelihood.
Among others.
However using Least Squares does have some benefits
Least squares tends to be more robust to outliers because an outlier can be wrong by at most 1 (because $(1-0)^2 = 1$), whereas under a negative log likelihood loss function, the distance can be arbitrarily large.
For more information check this or this out.
Edited
My interpretation of the OPs question is why do we use MLE instead of a square loss function to determine $\beta$ in a logistic regression model of the form:
$$logit(P(Y=1|X)) = x\beta$$
Where $P(Y=1|X) = f(x;\beta) = \frac{e^{x\beta}}{1 + e^{x\beta}} = \frac{1}{1 + e^{-x\beta}}$
So the loss function looks like:
$$\sum_{i} (y_i - f(x_i;\beta))^2 = \sum_{i} (y_i - \frac{1}{1 + e^{-x\beta}})^2$$
where $y_i$'s take values 0/1.
When I talk about computational efficiency, I mean finding the $\beta$ which minimizes the above vs. Fisher Scoring on the likelihood function.