4

I am using 10 fold cross validation using the CVlm() function from the DAAG package. This is part of the result shown:

Predicted    2.47e-04 2.26e-04 -0.000359  0.000335
cvpred      -7.88e-05 3.72e-06 -0.000597  0.000322    
y            1.47e-03 2.21e-03 -0.004676 -0.001969
CV residual  1.55e-03 2.21e-03 -0.004078 -0.002291

Sum of squares = 0    Mean square = 0    n = 48 

Overall (Sum over all 48 folds) 
      ms 
9.81e-06 

What is the difference between “Predicted” and “cvpred”? If I change the seed in CVlm() cvpred changes but Predicted remains the same. Can someone tell me how Predicted is calculated?

Gala
  • 8,323
  • 2
  • 28
  • 42
  • "Predicted" is predicted value using all observations. – user314946 Mar 19 '21 at 14:29
  • 1
    I’m voting to close this question because: (1) some relevant information appears to be missing; (2) it can no longer be reproduced; (3) the OP has not visited the site in years; & (4) this borders on a software question. – gung - Reinstate Monica Mar 19 '21 at 16:41

1 Answers1

1

In general, you get answers to questions like this by reading the documentation (?CVlm). I can't quite replicate your results; I suspect there is some information missing from the question, and the function may have evolved since 2013.

Here is what I get:

library(DAAG)
CVlm(data=houseprices, form.lm=formula(sale.price~area), m=3)
# Analysis of Variance Table
# 
# Response: sale.price
#           Df Sum Sq Mean Sq F value Pr(>F)  
# area       1  18566   18566       8  0.014 *
# Residuals 13  30179    2321                 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# 
# fold 1 
# Observations in test set: 5 
#                10    14    15    21     22
# area        905.0 963.0 821.0 771.0 1006.0
# cvpred      243.6 255.2 226.9 216.9  263.8
# sale.price  215.0 185.0 212.0 260.0  293.0
# CV residual -28.6 -70.2 -14.9  43.1   29.2
# 
# Sum of squares = 8684    Mean square = 1737    n = 5 
# 
# fold 2 
# Observations in test set: 5 
#                 11   12  16     17     19
# area        802.00 1366 714 1018.0 790.00
# cvpred      216.81  388 190  282.5 213.16
# sale.price  215.00  274 220  276.0 221.50
# CV residual  -1.81 -114  30   -6.5   8.34
# 
# Sum of squares = 14083    Mean square = 2817    n = 5 
# 
# fold 3 
# Observations in test set: 5 
#                 9   13    18    20   23
# area        694.0  716 887.0 696.0 1191
# cvpred      216.3  218 234.5 216.5  263
# sale.price  192.0  113 260.0 255.0  375
# CV residual -24.3 -106  25.5  38.5  112
# 
# Sum of squares = 26421    Mean square = 5284    n = 5 
# 
# Overall (Sum over all 5 folds) 
#   ms 
# 3279 

I'm guessing you have copied part of the output from one of the folds. It may help you to read an overview of how cross validation works (e.g., on CV see: Cross-Validation in plain english?).

Triangulating from the documentation, what you pasted into the question, and what I get here, I think I can make a pretty good guess what these are. There is an initial model using all the data. From that model, there is a standard predicted value for each point in the dataset. Subsequently, the data are split into folds and each fold is held out once where the model is fit to the rest of the folds and that model is used to predict the data in the held out fold. Your Predicted is the predicted value from the original, and your cvpred is the predicted value when that datum was held out.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650