Probability in Multivariate normal distribution (for template attacks)

Question

I am trying to replicate a cryptoanalysis technique called template attack. I won't enter in details of the attack itself, but the theory behind involves multivariate distribution (more details here).

So, I tried to come up with some small code in R to perform the analysis, but I got stuck on the multivariate part of the solution. I've been failing on computing the pdf of an example dataset. I'll add the code below.

Basically, in the code, I build the covariance matrix using 5 random variables with 50 values each. Then I compute the probability for a new random variable. My general doubt is why am I getting infinite (sometimes) or very high probabilities? I would like to get probabilities between 0 and 1. Is my code correct? Which probability should I get if I compute the pdf using one of the random variables of the covariance matrix? I guess it should be 1, right?

A couple of things I've been noticing. if I increase the value of the variable deviation I get "better" probabilities (that is, probabilities that will be between 0 and 1). My real data will have at least 100 variables and more than 1000 values each and variation of the data is very small (so, I'll likely get only infinite values for probability).

By the way, I know that a pdf can have values greater than 1, but I was expecting that in rare cases not always.

Code:

# Dataset setup
sample_size <-5;
deviation <- 0.005; 
traces <- 50;

# Traces vectors
bv <- matrix (rep(0,(sample_size*traces)), sample_size, traces);
for (i in 1:(sample_size)) {
    bv[i,] <- runif((traces), (i-deviation), (i+deviation));
}

# Covariance matrix computation
m <- matrix (rep(0, (sample_size*sample_size)), sample_size, sample_size);
for (i in 1:(sample_size)) {
    for (j in 1:(sample_size)) {
        m[i, j] <- cov(bv[i,], bv[j,]);
    }
}


# Testing sample vector
vt <- rep(0,sample_size)
for (i in 1:(sample_size)) {
    # Set testing vector as one of the trace vectors
    #vt[i] <- bv[i,1] - mean(bv[i,]);
    # Set a random testing vector
    vt[i] <- runif(1, i-0.00200, i+0.00200) - mean(bv[i,]);
}

# Compute the pdf
exp((-1/2)*t(as.matrix(vt))%*%solve(m)%*%as.matrix(vt) - 0.5*sample_size*log(44/7) - 0.5*log(det(m)))

score 1 · Answer 1 · edited Aug 03 '17 at 20:11

The line

exp((-1/2)*t(as.matrix(vt))%*%solve(m)%*%as.matrix(vt) -
0.5*sample_size*log(44/7) - 0.5*log(det(m)))

is pretty much correct. Instead of using 44/7, though, you should use 2 * pi. There's no reason to approximate it like that in R.

I haven't read through the rest of your code, but I can tell you that there is no need for densities to be bounded between $0$ and $1$. That's a misunderstanding. You are correct that probabilities have to be on this interval, but densities are not probabilities. You integrate densities to get probabilities. So if a density $f(x)$ is peaked and goes over $1$, then it can only do it in a very tiny space, because any probability $P(A) = \int_{A} f(x)dx \le 1$.

score 1 · Answer 2 · answered Aug 03 '17 at 19:42

your samples are drawn independently from a uniform distribution defined on a interval of twice the length deviation. Let $d$ denote the length of the interval. The variance of the uniform distribution is equal to $ \frac{1}{12}(d)^2 $. Since your deviation is small, the square is even smaller and we expect the variance to be pretty small. Since your random variables are drawn independently, we expect their covriance to be pretty small, too. We therefore expect your covariance matrix to be not too different from a matrix of zeros.

Therefore, the numerical inverse may be very large and due to limited numerical precision also unstable. The determinant can also be expected to be pretty close to zero. In the sample I draw when reproducing your code, it was about $10^{-26}$.

Finally, you're computing the pdf. In contrast to a cdf, values of a pdf need not lie in $[0, 1]$. If you want to evaluate a multivariate-normal cdf with your sampled moments, the 'mvtnorm' package could be interesting.

If you check the determinant by a more stable method, it's not very far from the correct value. — eric_kernfeld, Aug 03 '17 at 19:46
I'll look in the mvtnorm package. Thank you. For my real data I'll probably have to work with Cholesky decomposition. There is some papers dealing with this problem already, but I felt that information about the probability was missing. — Caiosan, Aug 03 '17 at 20:44

score 1 · Answer 3 · edited Sep 20 '19 at 19:13

(...) My general doubt is why am I getting infinite (sometimes) or very high probabilities?

You shouldn't be getting infinite probability densities. You are running into computing precision issues (see Goldberg, 1991, What Every Computer Scientist Should Know About Floating-Point Arithmetic). Rather then implementing the probability density function by hand, you should rather check some already-available implementations like mvtnorm package (plus, this would prevent the kind of problems like the one pointed by Taylor).

I would like to get probabilities between 0 and 1. (...)

But you won't. Multivariate normal distribution is a continuous distribution, so it does not have probability mass function, but it has probability density function. Densities are not numbers between $0$ and $1$, they are non-negative numbers, they can by anything greater or equal to zero. You simply can't have probabilities for continuous values because they all would be equal to zero. To learn more check the Can a probability distribution value exceeding 1 be OK? thread.

A couple of things I've been noticing. if I increase the value of the variable deviation I get "better" probabilities (that is, probabilities that will be between 0 and 1).

They are not anyhow "better", they are smaller. As you could learn from the linked thread, probability densities are probabilities per foot, so it you make units of your data larger, then the probability densities get smaller, if you make the units smaller, you get larger densities.

By the way, I know that a pdf can have values greater than 1, but I was expecting that in rare cases not always.

I already commented on this, but to say it once more, please take a look at the following plot, it shows (univariate) normal probability density with very small standard deviation. As you can see, most of the probability densities (y-axis) are much greater then one. This is how probability densities work.

What made me entirely confuse was that some papers about template attack addresses the pdf as probability and even shows a graph for the attack where the probability is between 0 and 1. As I said at the post, I knew that pdf can go greater than 1, but the lack of details in the papers about the probability values is what bothers me. They treat the case of numeric overflow, but don't talk about the treatment pdf values should have. By the way, if you execute the example code above, you won't see any problem with numeric representation. You can even run with sample_size equal 2. — Caiosan, Aug 03 '17 at 20:34
@Caiosan Maybe the papers were simply using examples with such standard deviation that made the probability densities fit the [0, 1] interval (e.g. they used standard normal, i.e. mean = 0, sd = 1). As about numeric precision issues -- you obviously *have* them if you get infinite densities. — Tim, Aug 03 '17 at 20:43

score 0 · Answer 4 · answered Aug 03 '17 at 19:45

You're not getting a probability, you're getting a probability density. The convention is that the integral of the probability density over the whole space equals 1. For your estimated normal distribution, a cube of radius $4\sigma$ can easily capture almost all of the probability mass ($\sigma$ is the standard deviation of any of the random variables you start with). This result depends on the dimension, but I calculated it for dimension 5. For you, $4\sigma$ is roughly 0.01, so the volume of the cube is 0.02^5 = 3.2E-9. The mean value over the cube has to be large enough to cancel that effect.

Probability in Multivariate normal distribution (for template attacks)

4 Answers4