Recursive logistic regression merge

Question

I need to make regression on big amount of data, each row have around 1000 features. Did will outcome will be same or better when i make 4 separate regressions of 250 features and after that i will make one regression that will have 4 features equal to underlying regression outputs?

I can't make one regression on all features because coefficient learning algorithm using to much memory.

General known solution:

$ Y' = sigmoid(X*\beta) $

My take:

$ X' = $\begin{bmatrix} Y_1' = sigmoid(X_{1-250}*\beta_1) \\\ Y_2' = sigmoid(X_{251-500}*\beta_2) \\ Y_3' = sigmoid(X_{501-750}*\beta_3) \\ Y_4' = sigmoid(X_{751-1000}*\beta_4)\end{bmatrix} $ Y'' = sigmoid(X'*\beta) $

I'm asking about differences/relationships between $Y'$ and $Y''$.

Rows (observations) around 100 000 000, features 1000 ($Length[X]$)

Sorry for bad formatting in equations, I'm not a MathJax master.

How many observations (rows) do you have? How many "cases" (Y = 1) do you have? — D L Dahly, Dec 22 '13 at 16:37
@D L Dahly: around 100 * 10^6 rows. Cases Y = 1, around 50%, in some data can be 50,5%, classes are not skewed. — Svisstack, Dec 22 '13 at 16:41
Clearly you can't list all 1000 features, but can you give us some idea of the context and why you have so many? — Peter Flom, Dec 22 '13 at 18:09
How many observations are there? If there are fewer observations than cases, then it is impossible to find a solution using each case. It may be best to attempt variable selection rather than trying to use each piece of information. — Drew75, Dec 22 '13 at 21:54
[Potentially relevant](http://stats.stackexchange.com/questions/23481/are-there-algorithms-for-computing-running-linear-or-logistic-regression-param) — Glen_b, Dec 22 '13 at 22:53
@Peter Flom: Just big problem to solve, I don't want talk about problem or other alternative solutions because it's complicated and out-of-scope of this question. — Svisstack, Dec 23 '13 at 00:10
@Drew75: Of course. I making variable selection via genetic algorithm, but this not solving that problem. Regression for that amount of features must be done, doesn't matter with higher or lower quality features. — Svisstack, Dec 23 '13 at 00:12
It is _very_ unlikely that such an approach will be as good as fitting all 1000 features using penalized maximum likelihood estimation with a quadratic ($L_2$) penalty. — Frank Harrell, Dec 23 '13 at 13:28
How much memory is too much memory? I'm not that familiar with big-data computation but it seems to me that problems like this can be solved by just paying someone (like Amazon) to run it for you. — shadowtalker, Aug 03 '14 at 19:44

score 1 · Answer 1 · answered Dec 23 '13 at 04:17

1

Glen_b pointed to another thread, implying that online methods discussed there help deal with very large number of observations. I am adding two references specifically on online logistic regression:

Algorithms for Sparse Linear Classifiers in the Massive Data Setting Suhrid Balakrishnan, David Madigan; JMLR 9(Feb):313--337, 2008.

J. Shi, W. Yin, and S. Osher, A new regularization path for logistic regression via linearized Bregman, Rice CAAM tech report TR12-24, 2012.

answered Dec 23 '13 at 04:17

AlexGenkin

341
1
4

I don't consider that in good-solution category, because things make complicated when I need to add for example some regularization terms. – Svisstack Dec 28 '13 at 14:44
How about "An Interior-Point Method for Large-Scale L1-Regularized Logistic Regression": http://web.stanford.edu/~boyd/papers/pdf/l1_logistic_reg.pdf. The abstract reports runtimes of "a few minutes" on a normal computer with a 1,000,000 x 1,000,000 model matrix. – shadowtalker Aug 03 '14 at 19:46

Recursive logistic regression merge

1 Answers1