This is not quite what you suggest, but what is sometimes done is to estimate a so-called "linear probability model". That entails ignoring the binary nature of the dependent variable and still run OLS on the $y_i$ directly.
That leads to certain non-desirable effects, mainly that predicted values of such a linear regression may of course leave the unit interval and hence produce predicted probabilities below 0 or above 1.
On the other hand, the fitted partial effects (which for OLS are of course nothing but the slope coefficients) are often very similar to those obtained from a probit model estimated by ML in the range where the regressors have their main support.
Hence, from a practical perspective, the differences can often be modest.