An estimator in the most general sense is a (deterministic) mathematical function of all possible potential data. That is, if the data from an experiment will be considered as an ordered $n$-tuple $(x_1, x_2, \ldots, x_n)$, then an estimator is a function
$$t:\mathbb{R}^n \to \mathbb{R}$$
which produces a number $t(x_1, x_2, \ldots, x_n)$ (the estimate) for each possible element $(x_1, x_2, \ldots, x_n)$ of $\mathbb{R}^n$ (the data).
When the data are modeled as the outcome of a (vector-valued) random variable $(X_1, X_2, \ldots, X_n)$, then
$$T = t(X_1, X_2, \ldots, X_n)$$
is itself a random variable (by virtue of the randomness of its argument) provided $t$ is measurable (which is an important technical condition that for conceptual purposes may be ignored). (For more about measurability, see the "afterward" in the preceding link.)
The standard error of estimate of $t$ is the standard deviation (SD) of $T$.
A far-reaching example is to take a random sample $(X_1, X_2, \ldots, X_n)$ (independently, with replacement) of a population having mean $\mu$ and standard deviation $\sigma$ which are unknown. In order to estimate $\mu$ one might use the sample mean,
$$t(x_1, x_2, \ldots, x_n) = \frac{1}{n}(x_1 + x_2 + \cdots + x_n).$$
Probability theory, as supported by the assumptions about the sample, tells us that $t$ estimates the mean in the sense that
$$\mathbb{E}(T) = \mu$$
and it provides a measures of how closely it tends to estimate the mean, such as
$$\text{SD}(T) = \frac{\sigma}{\sqrt{n}}.$$
This value is the standard error of the [estimate of the] mean, where the term "mean" describes $T$ (the mean of the sample), not $\mu$ (the mean of the population)! The fact that it is also a standard deviation (of the random variable $T$) may lead to misunderstandings among those who assume that "standard deviation" applies only to the population.