0

Below test data:

predictions = [-5 , 5 , 2]
actual = [1 , 2 , 5]

predictions[:3]

dataset = pd.DataFrame()
dataset['Predicted'] = predictions
dataset['Actual'] = actual

qgrid.show_grid(dataset,show_toolbar=True)

renders:

enter image description here

I'm attempting to find the optimal error threshold which will provide the highest accuracy of predictions.

For example to measure the percent change in values I use :

dataset['percent-change'] = ((dataset['Actual'] - dataset['Predicted']) / dataset['Predicted'] * 100) 

dataset

which renders:

enter image description here

To measure the predictions and different thresholds I use:

a= []
p = []
number_correct_at_threshold = []
number_incorrect_at_threshold = []

allowed_percent_error = 0
step_size = 100

for r in range(0 , 3):
    dataset['is-prediction-correct'] = dataset['percent-change'] <= allowed_percent_error
    dataset[ 'is-prediction-correct'] = dataset['is-prediction-correct'] * 1
    number_correct_at_threshold.append(len(dataset[dataset['is-prediction-correct'] ==1]))
    number_incorrect_at_threshold.append(len(dataset[dataset['is-prediction-correct'] ==0]))
    dataset['ts'] = dataset.index

    accuracy = round(len(dataset[dataset['is-prediction-correct'] == True]) / len(dataset) * 100 , 3)

    a.append(accuracy)
    p.append(allowed_percent_error)
    
    print(r)
    allowed_percent_error = allowed_percent_error + step_size

dataset_results = pd.DataFrame()
dataset_results['Accuracy'] = a
dataset_results['Threshold'] = p
dataset_results['# Correct Predictions'] = number_correct_at_threshold
dataset_results['# Incorrect Predictions'] = number_incorrect_at_threshold

qgrid.show_grid(dataset_results,show_toolbar=True)

which renders:

enter image description here

In addition I plan to sum the distance from each predicton to the actual value at each threshold to gain more information as to how well the algorithm is performing.

Are there other measures I can use to measure the accuracy of regression predictions ?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
blue-sky
  • 609
  • 1
  • 7
  • 17
  • Every prediction is at least a little bit wrong (which I would expect), so I do not follow what you're doing. – Dave Oct 14 '20 at 12:53
  • @Dave I'm trying to understand how varying a "little bit wrong" impacts the results. As the range of allowed error in predictions increases the prediction accuracy increases. – blue-sky Oct 14 '20 at 12:56
  • Is there some reason that you want to define a sharp cutoff for how much of a percentage error means an "incorrect" prediction? Usually with continuous output variables a measure like mean-square error (or sometimes mean absolute error) is used to "measure the accuracy of regression predictions." Even for classification-type problems, continuous measures of model performance (like the Brier score, the equivalent of mean-square error) are superior to all-or-none "accuracy" measures. – EdM Oct 14 '20 at 19:24
  • @EdM the sharp cutoff is defined in order to generate a set of threshold values, each threshold value will map to an overall prediction accuracy. An optimum threshold value is then selected based on acceptable % error of the model. Each threshold value impacts how accurate the model is at predicting values. Are you suggesting to use MSE error instead at each threshold instead of measuring the accuracy at each threshold ? – blue-sky Oct 14 '20 at 19:32
  • I'm suggesting to use the MSE _itself_ as the measure of "overall prediction accuracy." That's standard statistical practice in regression. Or if you think that percentage/fractional error is a better measure for your application, do the analysis on a log scale of the outcome variable. Then the MSE on the log-transformed outcomes is the mean-square _fractional_ error. Any arbitrary cutoff or thresholding of a continuous variable is likely to get you into trouble; see [this post](https://stats.stackexchange.com/a/230756/28500) in addition to many others on this site. – EdM Oct 14 '20 at 20:04

0 Answers0