I have 10 response variables and used 10 weighted elastic net models to find which of the 31 predictors that I have in my system can better explain my responses.
I obtain an R-squared for my models and most of the models have high R-squared.
Next, among my weights, I put a threshold and only used the predictors which are most likely to be able to explain my model.
I ran the analysis again and now I got R-squared above one. I don't understand why it should give me R-squared larger than one.
Here is the forlmula how I calculate the R-square:
var(x*beta)/var(response)
I run the following over 10 responses:
with all predictors:
res <- cv.glmnet(x=regressors,y=responses[i,],lambda=c(0.01,0.05,0.1,0.5,1,1.5,2,10,20,100),nfolds=10,family="gaussian",standardize=T,type.measure="mse",intercept=F,penalty.factor=w,grouped=FALSE)
and here are my r-squared values:
[1] 0.2143036 0.8983216 0.1033970 0.4073570 0.7410773 0.9009351 0.3518317 0.8386557 0.1640106 0.4902337 0.9408415 0.7705011 0.8918895 0.0604311
[15] 0.8324915 0.3142945 0.7603050 0.5791587 0.5458866 0.4644528 0.9424381 0.2226040 0.9106043 0.5826858 0.9370337 0.2573282 0.3955305 0.5008677
[29] 0.8530356 0.9427917 0.3889714
When I reduce the number of predictors:
res <- cv.glmnet(x=regressors[,names(which(w!=1))],y=responses[i,],lambda=c(0.01,0.05,0.1,0.5,1,1.5,2,10,20,100),nfolds=10,family="gaussian",standardize=T,type.measure="mse",intercept=F,penalty.factor=w[names(which(w!=1))],grouped=FALSE)
and r-squared for all responses are:
[1] 0.23608323 0.71910789 0.04624468 0.36666693 13.04262441 0.79911136 0.34117305 16.05521440 0.24017898 0.64007613 0.73259379 0.52822347
[13] 0.36245020 1.02954292 0.62319234 1.21837174 0.48313160 0.70221289 7.40865390 2.18222146 0.41393762 1.33439668 0.72242256 0.59092254
[25] 0.62969173 0.54824267 0.46230243 0.61607441 0.44151865 0.74692996 1.21428429