edit - more information about what the code given should represent
The following pseudocode outlines the problem as I have it
for each random seed in S
randomise the data
for k in 1 to 5
create test / training data
fit the model to the training data
generate score
Therefore I will have $S * 5$ individual accuracy scores. My end score is an average of these for which I would like to know the standard deviation.
original post
The following code represents my problem :
# S is the total number of random seeds to use
S = 3
# the size of each category, so original data will have 2n rows
n = 100
# number of "folds" to use
K = 5
# sample data
set.seed(2019)
original_data = data.frame(
x = c(rnorm(n, 0.457, 0.01), c(rnorm(n, 0.508, 0.11))),
y = c(rep(0, n), rep(1, n))
)
# will be a data frame to store the results.
results = NULL
iteration = 1
for(s in 1:S){
set.seed(s)
rnd = sample(1:(2*n))
# get randomised data
td = original_data[rnd,]
for(k in 1:K){
# get test and training data
trainset = td[1:140,]
testset = td[-(1:140),]
# fit model and get scores
m = glm(y ~ x, data = trainset, family = "binomial")
# get probabilities and predicted values
model_probabilities = predict(m, newdata=testset,
type="response")
model_predictions = 1 * ( model_probabilities >= 0.5)
# store results
results = rbind(results, data.frame(
seed = s, k = k, iteration = iteration,
probability = model_probabilities,
prediction = model_predictions,
observed = testset$y
))
iteration = iteration + 1
}
}
# table of predicted and observed
t = table(results$prediction, results$observed)
# convert into percentages
t = 100 * round(prop.table(t),3)
# compute the accuracy
accuracy = t[1,1] + t[2,2]
accuracy
With the output of :
> accuracy
[1] 51.1
> dim(results)
[1] 900 6
I want to know how to calculate the standard deviation for this accuracy measure.
edit - choice of $n$
still interested in the answer to this question, not sure if there's additional information required.
Initially I thought that I should just use
$$ \sqrt{ \frac{p(1-p)}{n} } $$
Where $n = $ number of rows in test set.
This doesn't seem to take into account that the accuracy score is averaged across many iterations, and I can't find literature for this