The question suggests some basic confusion between the equation and the solution
The Equation
Let ${\varphi} > 1$.
Consider the following (infinite) system of equations---one equation for each $t\in \mathbb{Z}$:
$$
X_{t}=\varphi X_{t-1}+e_{t}, \mbox{ where } e_t \sim WN(0,\sigma), \;\; t \in \mathbb{Z}. \quad (*)
$$
Definition
Given $e_t \sim WN(0,\sigma)$, a sequence of random variables $\{ X_t \}_{t\in \mathbb{Z}}$ is said to be a solution of $(*)$ if, for each $t$,
$$
X_{t}=\varphi X_{t-1}+e_{t},
$$
with probability 1.
The Solution
Define
$$
X_t= - \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k},
$$
for each $t$.
$X_t$ is well-defined: The sequence of partial sums
$$
X_{t,m} = - \sum_{k=1}^m {\varphi}^{-k}e_{t+k}, \;\; m \geq 1
$$
is a Cauchy sequence in the Hilbert space $L^2$, and therefore converges in $L^2$. $L^2$ convergence implies convergence in probability (although not necessarily almost surely). By definition, for each $t$, $X_t$ is the $L^2$/probability-limit of $(X_{t,m})$ as $m \rightarrow \infty$.
$\{ X_t \}$ is, trivially, weakly stationary. (Any MA$(\infty)$ series with absolutely summable coefficients is weakly stationary.)
$\{ X_t \}_{t\in \mathbb{Z}}$ is a solution of $(*)$, as can be verified directly by substitution into $(*)$.
This is a special case of how one would obtain a solution to an ARMA model:
first guess/derive an MA$(\infty)$ expression, show that it is well-defined, then verify it's an actual solution.
$\;$
...But the $\epsilon_t$ is not independent from $X_{t}$...
This impression perhaps results from confusing the equation and the solution.
Consider the actual solution:
$$
\varphi X_{t-1} + e_t = \varphi \cdot \left( - \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k-1}
\right) + e_t,
$$
the right-hand side is exactly $- \sum_{k=1}^\infty {\varphi}^{-k}e_{t+k}$, which is $X_t$ (we just verified Point #3 above). Notice how $e_t$ cancels and actually doesn't show up in $X_t$.
$\;$
...where this...derivation originally comes from...
I believe Mann and Wald (1943) already considered non-causal AR(1) case, among other examples. Perhaps one can find references even earlier. Certainly by the time of Box and Jenkins this is well-known.
Further Comment
The non-causal solution is typically excluded from the stationary AR(1) model because:
It is un-physical.
Assume that $(e_t)$ is, say, Gaussian white noise. Then, for every non-causal solution, there exists a causal solution that is observationally equivalent, i.e. the two solutions would be equal as probability measures on $\mathbb{R}^{\mathbb{Z}}$. In other words, a stationary AR(1) model that includes both causal and non-causal cases is un-indentified. Even if the non-causal solution is physical, one cannot distinguish it from a causal counterpart from data. For example, if innovation variance $\sigma^2 =1$, then the causal counterpart is causal solution to AR(1) equation with coefficient $\frac{1}{\varphi}$ and $\sigma^2 =\frac{1}{\varphi^2}$.