0

I want to calculate the variance of vector [0, 3, 4] in Python numpy. My code is

test=np.array([0, 3, 4])
print('test var', test.var(axis=0))

The output is

test var 2.8888888888888893. 

Why? I thought it should be 4.333333333333334.

The cov function yields the correct result:

print("Covariance matrix of test:\n", np.cov(test))

Output

Covariance matrix of test:
 4.333333333333334

On the other hand , if I have a 2-dimensional array like this

k1=0.1
N=100
x1=np.random.rand(N)
nor=np.random.normal(0,0.5,size=N)
x3=k1*nor+(1-k1)*x1
X=np.vstack((x1,x3)).T
print('X var', X.var(axis=0))
C = np.cov(X.T)
c00 = C[0, 0]
c11 = C[1, 1]
print('c00 ', c00)
print('c00 ', c11)

Output

X var [0.0861854  0.06790817]
c00  0.0870559565349231
c00  0.06859411169279941

Here the var gives the same result as is in the diagonal of C. But with the vector it is not the same. What's going on here ?

Suvi
  • 1
  • 1
  • $2.8888/4.3333 = 2/3$ demonstrates your question is answered at https://stats.stackexchange.com/questions/3931. – whuber Jan 24 '22 at 15:18

1 Answers1

1

For the vector case, try the following code and you may want to read some documentations.

test=np.array([0, 3, 4])
print('test var', test.var(axis=0))
print('test var', test.var(axis=0, ddof=1)) # Note ddof=1 here
print("Covariance matrix of test:\n", np.cov(test))

doc for np.var: checkout ddof

doc for np.cov: checkout bias, ddof

For the 2D case, they are not the same result, check all the digits :)

Raymond Kwok
  • 205
  • 1
  • 5
  • It’s worth mentioning that ddof=0 is population variance while ddof=1 is the sample variance. – Tim Jan 23 '22 at 15:00
  • @Tim Isn't it the other way around? – Suvi Jan 23 '22 at 19:41
  • @Suvi yes, corrected it. – Tim Jan 23 '22 at 19:46
  • @Raymond Kwok, Thank you. I understood the 1-dimensional case. But about the 2-dimensional: why they are so close but not exactly the same? I tried ddof=1 here too but they still aren't the same. – Suvi Jan 23 '22 at 20:01
  • Your 2D case computes variance for N=100 elements, so the numerical effect of setting ddof from 0 to 1 is much smaller than when you are computing variance for N=3 elements as in your vector case. remember it's about dividing the sum of squared difference from mean by (N-ddof), so for example ${xxx} \over {100}$ wouldn't be as different from ${xxx} \over {100-1}$ then ${xxx} \over {3}$ from ${xxx} \over {3-1}$ – Raymond Kwok Jan 23 '22 at 20:43
  • @Raymond Kwok Ok, but why I can't get the same result with cov() and var(), no matter whether I use ddof=1 or not? In the 1-dimensional case I at least was able to get the same result when I used ddof=1. – Suvi Jan 26 '22 at 14:15