I need to make regression on big amount of data, each row have around 1000 features. Did will outcome will be same or better when i make 4 separate regressions of 250 features and after that i will make one regression that will have 4 features equal to underlying regression outputs?
I can't make one regression on all features because coefficient learning algorithm using to much memory.
General known solution:
$ Y' = sigmoid(X*\beta) $
My take:
$ X' = $\begin{bmatrix} Y_1' = sigmoid(X_{1-250}*\beta_1) \\\ Y_2' = sigmoid(X_{251-500}*\beta_2) \\ Y_3' = sigmoid(X_{501-750}*\beta_3) \\ Y_4' = sigmoid(X_{751-1000}*\beta_4)\end{bmatrix} $ Y'' = sigmoid(X'*\beta) $
I'm asking about differences/relationships between $Y'$ and $Y''$.
Rows (observations) around 100 000 000, features 1000 ($Length[X]$)
Sorry for bad formatting in equations, I'm not a MathJax master.