Trying to Check Cov Matrix calculation from SVD using Numpy
Solution 1
Three fixes are necessary:
(1) Subtract off the variable means before performing the SVD:
x = x  x.mean(axis=0)
(2) In the call U,s,V = np.linalg.svd(x)
, the V
that is returned is already what you call $W^T$. So to obtain $X^TX=W\Sigma^2W^T$ you need to perform the matrix multiplication in the other order:
C = np.dot(np.dot(V.T,np.diag(s**2)),V)
(3) Normalize the result by $n1$:
C = C / (x.shape[0]  1)
Here's the updated function:
import numpy as np
def mycov(x):
x = x  x.mean(axis=0)
U, s, V = np.linalg.svd(x, full_matrices = 0)
C = np.dot(np.dot(V.T,np.diag(s**2)),V)
return C / (x.shape[0]  1)
Test run:
x = np.vstack([np.random.randn(1000), np.random.randn(1000), np.random.randn(1000)]).T
myout = cov(x)
pyout = np.cov(x.T)
print myout
print ""
print pyout
print ""
print np.allclose(myout, pyout)
Output shows agreement with np.cov(x.T)
:
[[ 1.04987565 0.01848885 0.01795246]
[0.01848885 1.01096332 0.00429439]
[ 0.01795246 0.00429439 1.01202886]]

[[ 1.04987565 0.01848885 0.01795246]
[0.01848885 1.01096332 0.00429439]
[ 0.01795246 0.00429439 1.01202886]]

True
Solution 2
You need to divide your SVD approach by the sample size  degrees of freedom...
print C/(x.shape[0]1)
There are still some numerical precision differences.
Related videos on Youtube
jeffery_the_wind
Updated on April 25, 2020Comments

jeffery_the_wind over 2 years
I am trying to work with the SVD and PCA. Just to check that I am doing what I think I am doing, I did a simple test in in python. The test is that I make a random matrix of realizations, and I construct the covariance matrix using the SVD, and then also using the built in numpy covariance function. I then compare the covariance output matrices... and I hope to see that they are identical (for the most part).
I am puzzled why I see that the two methods give 2 very different covariance matrices.
Assume that
X
is a 3 by 1000 matrix representing 1000 realizations in 3 separate trials.SVD covariance method:
python code:
import numpy as np x = np.vstack([np.random.randn(1000),np.random.randn(1000),np.random.randn(1000)]).T U, s, V = np.linalg.svd(x, full_matrices = 0) # #calculate covariance using SVD # C = np.dot(np.dot(V,np.diag(s**2)),V.T) print C # #check cov matrix calculation # CC = np.cov(x.T) print "" print CC
OUTPUT:
[[ 986.36682109 2.73620015 32.36077455] [ 2.73620015 1020.64662161 26.40580027] [ 32.36077455 26.40580027 1032.3221444 ]]  [[ 0.98769747 0.00423559 0.03172589] [ 0.00423559 1.01986134 0.02640778] [0.03172589 0.02640778 1.0335292 ]]
EDIT
I updated the problem with many more realizations. I noticed that the variances (along the diagonal) are all off by some factor of 10. The covariance values seems to be off by the same factor. the
np.cov
function must be normalizing the output matrix??
Tunococ over 9 yearsMy guess is that you might get identical result from SVD if you translate its columns or rows to have zero mean first.

jeffery_the_wind over 9 yearsThis is a good point, I will redo the question using normal random variables and many more realizations, this will enforce the zero mean and normal distribution stuff.

Tunococ over 9 yearsThere can still be a difference between the true mean and the sample mean. I believe the covariance function will adjust the sample mean to zero. But, well, if you already know that the true mean is zero and you want to estimate just the covariance, it is better to translate by the true mean, not the sample mean.


jeffery_the_wind almost 6 yearsJust looking at this again, and brushing up a bit. It seems the order of multiplication shouldn't matter as matrix multiplication is associative? Although I comfirm in python that those 2 methods do not produce identical results.

grand_chat almost 6 years@jeffery_the_wind Right, matrix multiplication is indeed associative but the fix here is not in regrouping the parentheses, it's in calculating $(CB)A$ instead of $(AB)C$.

jeffery_the_wind almost 6 yearsI see yes, and the multiplication is not commutative... Right, sorry. I don't quite see what you want to..... aahhhhh yes i see because V is already a transpose... My own notation was messing me up. Thanks!