In the notation I will use, $p$ will be the number of design variables (including the constant term), $n$ the number of observations with $n\geq2p+1$ (if this last condition was not met, the package would not have returned a fit but an error, so I assume it is met). I will denote by $\hat{\boldsymbol\beta}_{FLTS}$ the vector of coefficients estimated by FLTS (ltsReg
) and $\hat{\boldsymbol\beta}_{MM}$ the coefficients estimated by MM (lmrob
). I will also write:
$$r^2_i(\hat{\boldsymbol\beta})=(y_i-\boldsymbol x_i^\top\hat{\boldsymbol\beta})^2$$
(these are the squared residuals, not the standardized ones!)
The rlm
function fits an 'M' estimate of regression and, like @Frank Harrell's proposal made in the comments to your question, it is not robust to outliers on the design space. Ordinal regression has a breakdown point (the proportion of your data that needs to be replaced by outliers to pull the fitted coefficients to arbitrary values) of essentially $1/n$ meaning that a single outlier (regardless of $n$!) suffice to render the fit meaningless. For regression M estimates (e.g. Huber M regression) the breakdown point is essentially $1/(p+1)$. This is somewhat higher but in practice still uncomfortably close to 0 (because often $p$ will be large). The only conclusion that can be drawn from rlm
finding a different fit than the other two methods is that it has been swayed by design outliers and that there must be more than $p+1$ of these in your data set.
In contrast, the other two algorithms are much more robust: their breakdown point is just below $1/2$ and more importantly, doesn't shrink as $p$ gets large. When fitting a linear model using a robust method, you assume that at least $h=\lfloor(n+p+1)/2\rfloor+1$ observations in your data are uncontaminated. The task of these two algorithms is to find those observations and fit them as well as possible. More precisely, if we denote:
\begin{align}
H_{FLTS} &= \{i:r^2_i(\hat{\boldsymbol\beta}_{FLTS})\leq q_{h/n}(r^2_i(\hat{\boldsymbol\beta}_{FLTS}))\} \\
H_{MM} &= \{i:r^2_i(\hat{\boldsymbol\beta}_{MM})\leq q_{h/n}(r^2_i(\hat{\boldsymbol\beta}_{MM}))\}
\end{align}
(where $q_{h/n}(r^2_i(\hat{\boldsymbol\beta}_{MM}))$ is the $h/n$ quantile of the
vector $r^2_i(\hat{\boldsymbol\beta}_{MM})$)
then $\hat{\boldsymbol\beta}_{MM}$ ($\hat{\boldsymbol\beta}_{FLTS}$) tries to fit the observations with indices in $H_{MM}$ ($H_{FLTS}$).
The fact that there are large differences between $\hat{\boldsymbol\beta}_{FLTS}$ and $\hat{\boldsymbol\beta}_{MM}$ indicates that the two algorithms do not identify the same set of observations as outliers. This means that at least one of them is swayed by the outliers. In this case, using the (adjusted) $R^2$ or any one statistics from either of the two fits to decide which to use, though intuitive, is a terrible idea: contaminated fits typically have smaller residuals than clean ones (but since knowledge of this is the reason one uses robust statistics
in the first place, I assume that the OP is well aware of this fact and that I don't need to expand on this).
The two robust fits give conflicting results and the question is which is correct? One way to solve this is to consider the set:
$$H^+=H_{MM}\cap H_{FLTS}$$
because $h\geq[n/2]$, $\#\{H^+\}\geq p$. Furthermore, if either of $H_{MM}$ or $H_{FLTS}$ is free of outliers, so is $H^+$. The solution I propose exploits this fact. Compute:
$$D(H^+,\hat{\boldsymbol\beta}_{FLTS},\hat{\boldsymbol\beta}_{MM})=\sum_{i\in H^+}\left(r^2_i(\hat{\boldsymbol\beta}_{FLTS})-r^2_i(\hat{\boldsymbol\beta}_{MM})\right)$$
For example, if $D(H^+,\hat{\boldsymbol\beta}_{FLTS},\hat{\boldsymbol\beta}_{MM})<0$, then,
$\hat{\boldsymbol\beta}_{FLTS}$ fits the good observations better than $\hat{\boldsymbol\beta}_{MM}$ and so I would trust $\hat{\boldsymbol\beta}_{FLTS}$ more. And vice versa.