I am looking to implement robust regression in R for large data (n=~500,000). The two options that come up are lmrob
and rlm
. When I used lmrob, it gave me the following error:
Error in lmrob.S(x, y, control = control) :
Fast S large n strategy failed. Use control parameter 'fast.s.large.n = Inf'.
I changed the parameter fast.s.large.n
to Inf
. But after running for
4-5 hours, it ended with a warning:
Warning messages:
1: In lmrob.S(x, y, control = control) :
S refinements did not converge (to refine.tol=1e-07) in 200 (= k.max) steps
I switched to rlm
. Unfortunately, it gave an error as well:
Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, :
'x' is singular: singular fits are not implemented in 'rlm'
I followed the suggestion on this forum to leave some categorical variables as characters in the data frame rather than converting to factor
. I re-ran rlm
and it completed within less than a minute.
I am not sure if the results of rlm
I obtained can be trusted. One, the rlm took a fraction of time that lmrob
took. Two, the categorical variables are left as characters, instead of factors.
Are lmrob
and rlm
this drastically different for the same purpose of robust regression?