1

I need to make regression on big amount of data, each row have around 1000 features. Did will outcome will be same or better when i make 4 separate regressions of 250 features and after that i will make one regression that will have 4 features equal to underlying regression outputs?

I can't make one regression on all features because coefficient learning algorithm using to much memory.

General known solution:

$ Y' = sigmoid(X*\beta) $

My take:

$ X' = $\begin{bmatrix} Y_1' = sigmoid(X_{1-250}*\beta_1) \\\ Y_2' = sigmoid(X_{251-500}*\beta_2) \\ Y_3' = sigmoid(X_{501-750}*\beta_3) \\ Y_4' = sigmoid(X_{751-1000}*\beta_4)\end{bmatrix} $ Y'' = sigmoid(X'*\beta) $

I'm asking about differences/relationships between $Y'$ and $Y''$.

Rows (observations) around 100 000 000, features 1000 ($Length[X]$)

Sorry for bad formatting in equations, I'm not a MathJax master.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Svisstack
  • 63
  • 9
  • How many observations (rows) do you have? How many "cases" (Y = 1) do you have? – D L Dahly Dec 22 '13 at 16:37
  • @D L Dahly: around 100 * 10^6 rows. Cases Y = 1, around 50%, in some data can be 50,5%, classes are not skewed. – Svisstack Dec 22 '13 at 16:41
  • Clearly you can't list all 1000 features, but can you give us some idea of the context and why you have so many? – Peter Flom Dec 22 '13 at 18:09
  • How many observations are there? If there are fewer observations than cases, then it is impossible to find a solution using each case. It may be best to attempt variable selection rather than trying to use each piece of information. – Drew75 Dec 22 '13 at 21:54
  • 1
    [Potentially relevant](http://stats.stackexchange.com/questions/23481/are-there-algorithms-for-computing-running-linear-or-logistic-regression-param) – Glen_b Dec 22 '13 at 22:53
  • @Peter Flom: Just big problem to solve, I don't want talk about problem or other alternative solutions because it's complicated and out-of-scope of this question. – Svisstack Dec 23 '13 at 00:10
  • @Drew75: Of course. I making variable selection via genetic algorithm, but this not solving that problem. Regression for that amount of features must be done, doesn't matter with higher or lower quality features. – Svisstack Dec 23 '13 at 00:12
  • 2
    It is _very_ unlikely that such an approach will be as good as fitting all 1000 features using penalized maximum likelihood estimation with a quadratic ($L_2$) penalty. – Frank Harrell Dec 23 '13 at 13:28
  • How much memory is too much memory? I'm not that familiar with big-data computation but it seems to me that problems like this can be solved by just paying someone (like Amazon) to run it for you. – shadowtalker Aug 03 '14 at 19:44

1 Answers1

1

Glen_b pointed to another thread, implying that online methods discussed there help deal with very large number of observations. I am adding two references specifically on online logistic regression:

Algorithms for Sparse Linear Classifiers in the Massive Data Setting Suhrid Balakrishnan, David Madigan; JMLR 9(Feb):313--337, 2008.

J. Shi, W. Yin, and S. Osher, A new regularization path for logistic regression via linearized Bregman, Rice CAAM tech report TR12-24, 2012.

AlexGenkin
  • 341
  • 1
  • 4
  • I don't consider that in good-solution category, because things make complicated when I need to add for example some regularization terms. – Svisstack Dec 28 '13 at 14:44
  • How about "An Interior-Point Method for Large-Scale L1-Regularized Logistic Regression": http://web.stanford.edu/~boyd/papers/pdf/l1_logistic_reg.pdf. The abstract reports runtimes of "a few minutes" on a normal computer with a 1,000,000 x 1,000,000 model matrix. – shadowtalker Aug 03 '14 at 19:46