Do not worry about whether the disturbances are 'observable' or 'unobservable'.
It comes down to how we assume that the data is being generated. If we have a series that we are trying to model with a MA(1) process, this means that we are assuming that the series is generated by a disturbance term ($\epsilon_{t}$) and a lagged damped value of this disturbance term ($\theta \epsilon_{t-1}$). This is the definition of an MA(1) series and it means that these disturbances are what drive or create the process $X$. So it is because we are assuming that the series is created by these disturbance terms, and these disturbance terms create the series $X$ in this specific way, that we assume the first formula.
Your second formula would not work as a data generating process because all of the disturbance terms can be cancelled out by substituting in values for $X$. Thus, if all disturbances are cancelled, the series if predetermined and that is probably not how you are looking to model your series.
$X_{t} = \eta_{t} + \theta \eta_{t-1}$
where
$\eta_{t} = X_{t} - X_{t-1}$
Substituting:
$X_{t} = X_{t} - X_{t-1} + \theta (X_{t-1} - X_{t-2})$
rearranging:
$X_{t-1} = \theta (X_{t-1} - X_{t-2})$
and finally
$X_{t-1} = \frac{\theta}{1 - \theta} X_{t-2}$
for all $t$