1

I am looking to implement robust regression in R for large data (n=~500,000). The two options that come up are lmrob and rlm. When I used lmrob, it gave me the following error:

Error in lmrob.S(x, y, control = control) : 
  Fast S large n strategy failed. Use control parameter 'fast.s.large.n = Inf'.

I changed the parameter fast.s.large.n to Inf. But after running for 4-5 hours, it ended with a warning:

Warning messages:
1: In lmrob.S(x, y, control = control) :
  S refinements did not converge (to refine.tol=1e-07) in 200 (= k.max) steps

I switched to rlm. Unfortunately, it gave an error as well:

Error in rlm.default(x, y, weights, method = method, wt.method = wt.method,  : 
  'x' is singular: singular fits are not implemented in 'rlm'

I followed the suggestion on this forum to leave some categorical variables as characters in the data frame rather than converting to factor. I re-ran rlm and it completed within less than a minute.

I am not sure if the results of rlm I obtained can be trusted. One, the rlm took a fraction of time that lmrob took. Two, the categorical variables are left as characters, instead of factors.

Are lmrob and rlm this drastically different for the same purpose of robust regression?

SanMelkote
  • 621
  • 5
  • 20

0 Answers0