Scalable multinomial regression implementation

Question

I need to do a high dimensional biological data analysis. My data consists of hundreds of thousands of dimensions. I am looking for an implementation of multinomial logistic regression that will scale well to data of this size.

Ideally, it should allow me to also do Ridge and Lasso regressions also. Which software should I be using?

@Andy, can you clarify your question a little bit? Do you have hundreds of thousands of predictors for each response? Or hundreds of thousands of responses? Or perhaps, a data matrix with hundreds of thousands of entries? Or maybe you meant "hundreds ***or*** thousands"? The answer to this will provide some guidance. Also, how many class categories do you have? What sort of computing resources are available? Are your features sparse or dense? — cardinal, Mar 01 '11 at 01:52
@Andy, for ***binary*** logistic regression, Paul Komarek built a package as his dissertation research at Carnegie Mellon. It has some optimizations for sparse features. I don't believe it does multinomial logistic regression, but I could be mistaken. He actually uses a ridge version of logistic regression, but with a fixed ridge parameter. His claim is that that is good enough for (most) all problems. The fitting method uses conjugate gradients if I recall. I'm not sure it's amenable to parallelization, though. — cardinal, Mar 01 '11 at 01:55
@cardinal : Thanks for your answer. Good questions ! I have 12 classes, and the number of predictors can go from 80k to 200k depending on the source of the data. And yes this is a highly sparse sparse data set. Regarding computing resources, I think I might be able to get access to a cluster (~10nodes or so), but I think if push comes to shove, I can try and get more computing resources also. — Andy, Mar 01 '11 at 02:39
@Andy, how much information are in those features? Can't be much. Are the features continuous-valued or binary? There might be a quick-and-dirty approach where, e.g., you can train 12 binary-logistic regressions separately and then use there predictions in a final step to get a proper distribution for the twelve-class case. (For example, train a 12-feature multinomial logistic regression when you're done training each of the individual ones. Obviously, that's suboptimal, but depending on the application, it might be both computationally feasible and good enough.) — cardinal, Mar 01 '11 at 02:56
@Andy, I just checked and **[Komarek](http://komarix.org/ac/lr)** does provide some very simple suggestions for multi-class extensions. They're simpler than what I was suggesting. One reason mine might be better is that if you fit 12 separate binary logistic regressions, the associated parameters aren't necessarily on similar scales. By having a final step that is a full multinomial logistic regression, the parameters in the final fit can compensate for those scale differences that a simple renormalization wouldn't. Besides, simple renorming wouldn't change your predicted class anyway. — cardinal, Mar 01 '11 at 03:04
@cardinal : Actually I do not have a sense of how much information is likely to be there in the features, since I am yet to get access to the data :-) The features are all nominal and between 0 and 1. But yes, I think I will do some feature selection. I looked at Komarek's software, and it looks promising. Two things are not clear to me though: (1) How do I get the regression weights? It's not clear from his webpage or the software documentation (2)It seems like he is using Ridge regression. I need to use Lasso as well. — Andy, Mar 01 '11 at 04:38
@Andy, how many observations do you have? For a lasso version of logistic regression, there is **glmnet**, but I've found it flaky even on small problems and I doubt it will scale to the size you are interested in. But, it's probably about the only package that even has a chance. The others tend to use iterative versions of the LARS algorithm. I think your first step should be to try some methods of feature filtering to get down to a more manageable size and see how bad that does. Are you at liberty to discuss more details of the application? — cardinal, Mar 01 '11 at 12:45
@cardinal: The number of observations can go from 200-20k depending on the type of data — Andy, Mar 01 '11 at 14:34
@Andy, so even in the best case you have four times as many predictors as observations? And, in the worst case you have 1000 times as many predictors as observations?? In linear regression, the lasso will only give you at most the minimum of the number of observations and predictors in terms of nonzero coefficients. I'd expect this to carry over to the logistic regression case as well. What is the ultimate goal of this study? Are you analyzing SNPs or something? — cardinal, Mar 01 '11 at 14:47
@cardinal:I have data from different experiments and I need to study the relative importance of values/weights of the predictors obtained from various experiments - different predictors are used across different experiments. One example would be, if I have data about a gene's function (response variable) from experiments EX1 and EX2 (both experiments are likely to predict identical gene function), can I say anything about the relative importance of the predictors? I am sorry if this seems fuzzy. I don't have much background in Statistics, and am trying to learn things as I go along. — Andy, Mar 01 '11 at 14:55
@cardinal : OK, so I got my first set of data, and with feature reduction (by a factor of 3) I was able to obtain pretty much same cross validation values as those without. It leads me to (carefully) conjecture that I might be able to do significant feature reduction after all... — Andy, Mar 01 '11 at 23:06
Would be interested in hearing how you found the bayesian regression software BMR/BBR Andy. — Will Beauchamp, Sep 17 '13 at 19:49

conjugateprior · Accepted Answer · 2011-03-02T20:44:34.913

4

I've had good experiences with Madigan's and Lewis's BMR and BBR packages for multiple category dependent variables, lasso or ridge priors on parameters, and high dimensional input data. Not quite as high as yours, but it might still be worth a look. Instructions are here: http://bayesianregression.com/bmr.html

edited Mar 02 '11 at 20:44

answered Mar 01 '11 at 23:49

conjugateprior

19,431
1
55
83

Thanks for your answer. I have heard about it also. I downloaded them but could not find any documentation at all. For ex. what should the input file format be? Am I just not seeing it? – Andy Mar 02 '11 at 00:05
Just here: http://www.bayesianregression.com/bmr.html – conjugateprior Mar 02 '11 at 18:53
Would be interested in hearing how you found the bayesian regression software BMR/BBR @Andy – Will Beauchamp Sep 17 '13 at 23:18

Scalable multinomial regression implementation

1 Answers1