I have a large, sparse dgCMatrix
matrix in R:
- ~200,000 rows
- ~150,000 columns
- ~1,000,000,000 non-zero entries
R code to generate the matrix:
nrows <- 2e5
ncols <- 1.5e5
nnz <- 1e9
set.seed(42)
i <- sample(1:nrows, nnz, replace=TRUE)
j <- sample(1:ncols, nnz, replace=TRUE)
x <- sparseMatrix(i=i, j=j, x=1, dims=c(nrows, ncols))
I also have a vector y
of length 200,000:
cf <- rnorm(1:ncols)
cf[sample(1:ncols, ncols/2)] <- 0
y <- (x %*% cf)[,1] + rnorm(nrows) * 100
I'd like to find a set of weights w
such that they minimize MAE: mean(abs(y - x %*% w))
(or mean absolute error)
I can find weights that minimize RMSE (root mean squared error) using glmnet:
model <- cv.glmnet(x, y, family = 'gaussian', nfolds=5)
But so far I can't find a similar package for minimizing MAE/LAD/L1 error. The closest thing I've found is the flare package, but that doesn't support sparse input.
Does anyone know of any R package (or have good tricks) for trying to solve a large-scale, sparse MAE/LAD/L1 regression problem?