MA model can for example take the form:
$$Y_t = \beta\epsilon_{t-1} + \epsilon_t.$$
Now, to estimate $\beta$ we need to find out $\epsilon_t$. That can be found from the AR($\infty)$ representation:
$$\epsilon_t = Y_t+ \sum_{i=1}^k(-\beta^i)Y_{t-i}.$$
Ideally $k$ would be infinite, but that is not possible so some $k$ needs to be selected that can be supported by the data. Using least squares, the equation we have is:
$$\operatorname{minimize }\left( \sum_t \left(Y_t+\sum_{i=1}^k(-\beta^i)Y_{t-i}\right)^2\right).$$
So we have an AR model, with non-linear parameters.
Question: As we know, in estimation of AR models, the data used in the estimation process has to be reduced depending on the amount of lags. Thus, for MA model, does the estimation result not highly depend on what k is selected? For large $k$ we have better statistical properties, but less data for the estimation... is this correct? If it is, why do most statistical packages not show what $k$ was selected?
EDIT: Since there seems to be some confusion regarding to what is being asked. Look at the least squares minimization problem for the MA coefficient. Now assume $k=1$, this would be one way to estimate the coefficient. But now we have a AR(1) model, not very good estimation... Next assume $k = N-1$. Now we are getting closer to the true MA(1) model... But there is only one point of data for estimation since all the lags eat the data. It seems that some choice has to be made between these two extremes, making the estimation non-unique. Unless I am wrong and we can max out $k$ and not pay a penalty (in which case there would be a unique solution).