Compute standard deviation of accuracy

Question

edit - more information about what the code given should represent

The following pseudocode outlines the problem as I have it

for each random seed in S
    randomise the data
    for k in 1 to 5
        create test / training data
        fit the model to the training data
        generate score

Therefore I will have $S * 5$ individual accuracy scores. My end score is an average of these for which I would like to know the standard deviation.

original post

The following code represents my problem :

# S is the total number of random seeds to use
S = 3
# the size of each category, so original data will have 2n rows
n = 100
# number of "folds" to use
K = 5
# sample data
set.seed(2019)
original_data = data.frame(
  x = c(rnorm(n, 0.457, 0.01), c(rnorm(n, 0.508, 0.11))),
  y = c(rep(0, n), rep(1, n))
)
# will be a data frame to store the results. 
results = NULL
iteration = 1
for(s in 1:S){
  set.seed(s)
  rnd = sample(1:(2*n))
  # get randomised data
  td = original_data[rnd,]
  for(k in 1:K){
    # get test and training data
    trainset = td[1:140,]
    testset  = td[-(1:140),]
    # fit model and get scores
    m = glm(y ~ x, data = trainset, family = "binomial")
    # get probabilities and predicted values
    model_probabilities = predict(m, newdata=testset, 
                                type="response")
    model_predictions   = 1 * ( model_probabilities >= 0.5)
    # store results
    results = rbind(results, data.frame(
      seed = s, k = k, iteration = iteration,
      probability = model_probabilities, 
      prediction   = model_predictions,
      observed     = testset$y
    ))
    iteration = iteration + 1
  }
}

# table of predicted and observed
t = table(results$prediction, results$observed)
# convert into percentages 
t = 100 * round(prop.table(t),3)
# compute the accuracy 
accuracy = t[1,1] + t[2,2]
accuracy

With the output of :

> accuracy
[1] 51.1
> dim(results)
[1] 900   6

I want to know how to calculate the standard deviation for this accuracy measure.

edit - choice of $n$

still interested in the answer to this question, not sure if there's additional information required.

Initially I thought that I should just use

$$ \sqrt{ \frac{p(1-p)}{n} } $$

Where $n = $ number of rows in test set.

This doesn't seem to take into account that the accuracy score is averaged across many iterations, and I can't find literature for this

edit - still unanswered

Tim · Answer 1 · 2019-09-22T20:02:49.667

Your procedure is overly complicated, just use bootstrap. With bootstrap you would randomly, with replacement, take samples of size $n$, out of your dataset of size $n$. At each iteration you would repeat the whole procedure, including fitting your model, making predictions, and calculating accuracy. You would repeat this many times (hundreds or more) and then simply calculate standard deviation of the estimated accuracies.

If you'd use samples smaller then $n$, the estimate would not reflect the actual variability of the data, it would overestimate the standard deviations (smaller samples vary more). If you use small number of iterations of the algorithm, your estimate of the standard deviation would itself not be precise.

score 0 · Answer 2 · answered Apr 29 '19 at 14:32

0

It is not quite clear what situation your code represents. You generate 200 random numbers (probabilities), 100 from each of two different distributions, in 50 blocks, where you change the RNG seed every 5 blocks, such that the total number of rows in the final result is 10,000.

The Wald formula for the SD of proportion is OK to use, but $n$ must be the number of independently occurring events. Only you know what events are independent in your situation. If you really did "flip a coin" 10,000 times, then it would be appropriate to use this number.

As an aside, this Wald formula has been somewhat discredited, and it is now recommended to use either the Wilson formula, or the Agresti-Coull formula. See Brown et al. (1999).

answered Apr 29 '19 at 14:32

Mihael

513
3
14

I've rewritten the code, it's a bit longer but hopefully clearer – baxx Apr 29 '19 at 15:09
It looks like you are estimating the model prediction accuracy using a Monte-Carlo simulation. In this case, it would be OK to use each model prediction as an independent event, and so to use the above formulae. – Mihael Apr 29 '19 at 15:23
Does that mean that I would use $n = 900$ within $\sqrt{pq/n}$ (where $q=1-p$) – baxx Apr 29 '19 at 15:31
1

@baxx If you simulate 60 values in 15 blocks, then yes, $n = 60\times15 = 900$. However, the value of changing the random seed every 5 blocks is not clear to me, or even the value of blocking at all. I think, the results would be practically equivalent if you simulated 900 values at once without blocking. But that is another question... – Mihael Apr 29 '19 at 15:42
Is my question clear? I feel that the code is fairly clear, given that, could you edit your answer to reflect it please? – baxx Apr 29 '19 at 16:14
Are you going to edit this answer ? – baxx May 03 '19 at 20:17

Compute standard deviation of accuracy

edit - more information about what the code given should represent

original post

edit - choice of $n$

edit - still unanswered

2 Answers2