0

According to Wikipedia, the dprime score (aka 'sensitivity index') can be expressed as

$$ d' = Z(\text{hit rate}) - Z(\text{false alarm rate})$$

hit rate (aka recall aka sensitivity) and false alarm rate (equal to 1-specificity). These quantities are point estimates, that is each is a single scalar between 0 and 1.

Z represents the inverse of the cumulative distribution function of the Gaussian distribution. I don't understand how this is supposed to be computed from a single value. A distribution of hit rates and a distribution of false alarm rates would be needed.

In the Wikipedia article, d-prime is also expressed as the Z score of the area of the receiver-operator characteristic times the square root of two.

$$ d' = \sqrt{2} Z(AUC)$$

Q: Is the use of the Z-score in this notation meant to represent prediction and ground truth values that have been Z-normalized?

Is the code below a valid implementation of dprime?


import numpy as np
from sklearn.metrics import roc_auc_score
from scipy import stats
from scipy.stats import norm
import math
Z = norm.ppf

y_true = np.array([0, 0, 1, 1])
y_pred = np.array([0, 0, 1, 1])
dprime = math.sqrt(2) * Z(roc_auc_score(y_true,y_pred))
print(dprime)

This prints inf, as the classifier has made no errors.

andandandand
  • 133
  • 6

1 Answers1

1

In the wikipedia page, it says

... where function Z(p), p ∈ [0,1], is the inverse of the cumulative distribution function of the Gaussian distribution.

So, the input to $Z$ function is a single value in $[0,1]$, which can be hit rate, false alarm rate or AUC, which means you don't need a distribution. It doesn't say but it should be the inverse CDF of standard gaussian.

gunes
  • 49,700
  • 3
  • 39
  • 75