With notation: outcome $Y$, (binary) treatment $A$, and covariates $L$. In Hernan and Robins (2020) causal inference textbook:
To obtain a doubly robust estimate of the average causal effect, first estimate the IP (inverse probability) weight W = 1/f (A|L). Then fit an outcome regression model –a generalized linear model with a canonical link–for E[Y |A = a, L = l, R] that adds the covariate R, where R = W if A = 1 and R = −W if A = 0. Finally, use the predicted values from the outcome model to obtain the standardized mean outcomes under A = 1 and A = 0. The difference of the standardized mean outcomes is now doubly robust.
The question is why we define the clever covariate as $R_i = \frac{A_i}{{\hat{} (_ )}} - \frac{1-A_i}{{1-\hat{} (_ )}}$, such that we can obtain the following doubly robust estimator for $[^1 - Y^0]$:
$$ \frac{1}{} ∑_{=1}^ \left[ \frac{_ _ - [A_i - \hat{} (_ )] _1 (_)}{\hat{} (_ )} - \frac{ (1-_) _ + [A_i - \hat{} (_ )] _0 (_)}{1 - \hat{} (_ )} \right],$$ where $m_1(L) = \hat{E}(Y|A=1,L), m_0(L) = \hat{E}(Y|A=0,L), \pi(L) = E(A=1|L)$. This looks like solving two least-squares problems of $Y$ on $A$ and $L$, with and without the clever covariate. Is there any reference on this? Thanks!