I've been trying to create a multivariate regression model to fit my training data into the prediction of a value. I've put my data into a matrix X
with m x n
where m
is the number of instances and n
the number of features/predictors. My label vector is then m x 1
. This is my code to predict the theta values, or parameters.
theta_matrix = pinv(X'*X)*X'*y_label;
Now I want to slip the data into train and test, and by researching I've found that cross-validation
in 10-fold can be a good option. If I do so, wouldn't it get me 10 sets of parameters theta? So what to choose from then?
And about feature selection, I've found that stepwise
can be a good choice, but I think it does not take into account that features can be correlated. Any alternative?