3

I'm doing PCA with Python with dataset decathlon in which I'm interested in 3 variables 100m, Long.jump, and Shot.put. Then I compute the covariance matrix of these 3 variables. Then I find its eigenvalues and corresponding eigenvectors.

The eigenvalues are 
 [0.69417929 0.03050717 0.12428585]

 The eigenvectors are 
 [[-0.12933352  0.83021401  0.54223385]
 [ 0.09032618  0.55441688 -0.82732286]
 [ 0.98747862  0.05802267  0.14669475]]

Could you please explain how to plot the corresponding ellipsoid?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import plotly.graph_objects as go 
import plotly.io as pio
pio.renderers.default = 'notebook'
import plotly.offline as pyo
pyo.init_notebook_mode()
from sklearn import decomposition # PCA
from numpy import linalg as LA

%config InlineBackend.figure_format = 'svg' # Change the image format to svg for better quality

decathlon = pd.read_csv("https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/decathlon.txt", sep = '\t')

tmp = decathlon.iloc[:, 0:10]
tmp2 = tmp[['100m', 'Long.jump', 'Shot.put']]
tmp3 = np.cov(np.transpose(tmp2))

# w contains the eigenvalues; v contains the corresponding eigenvectors, one eigenvector per column
w, v = LA.eig(tmp3)

print('The eigenvalues are \n', w)

print('\n The eigenvectors are \n', v)
Akira
  • 381
  • 1
  • 9
  • 2
    Maybe you can find answer here: https://stats.stackexchange.com/questions/372336/confidence-regions-on-bivariate-normal-distributions-using-hat-sigma-mle, https://stats.stackexchange.com/questions/391706/drawing-95-ellipse-over-scatter-plot, https://stats.stackexchange.com/questions/447694/ellipse-region-shape-from-standard-deviations, https://stats.stackexchange.com/questions/81285/appropriate-measure-to-find-smallest-covariance-matrix/385902#385902 – kjetil b halvorsen Nov 02 '20 at 11:30
  • *Mathematica* code (which essentially gives a mathematical formula, too) is available at https://mathematica.stackexchange.com/a/21402/91 (following the paragraph beginning "If you want better control ..."). – whuber Nov 02 '20 at 17:29
  • @whuber From your answer, I feel that you plotted an ellipse, not an ellipsoid. IMHO, plotting ellipse only requires a $2 \times 2$ covariance matrix. But here I have $3 \times 3$ one. Could you elaborate on this point? – Akira Nov 02 '20 at 17:33
  • The link I provided goes to a solution to plot an ellipsoid: the illustration there shows 3 axes in the plot. – whuber Nov 02 '20 at 17:58
  • @whuber Could you please explain why the shape of the ellipsoid in this [answer](https://mathematica.stackexchange.com/a/169536/64000) is flat? It looks like an ellipse on 2D plane. – Akira Nov 02 '20 at 22:30
  • It may be a little *flattened,* but it's plainly not 2D: you can determine its thickness by looking at all three axes in the plots. (Its aspect ratio is around 101:16:1.) If you're not convinced, compute the eigenvalues of the matrix and notice all three are positive. Even if it were reduced to two dimensions, *it is still a figure in three dimensions* and drawing it is no different than drawing any other ellipsoid in 3D. – whuber Nov 02 '20 at 22:53
  • @whuber From this [answer](https://stats.stackexchange.com/a/373988/249463), I see the relevance of covariance matrix in drawing the ellipsoid. But I'm still not clear how the relevance of its eigenvectors and eigenvalues. Could you please suggest some references about it? – Akira Nov 02 '20 at 22:57
  • The eigenvectors and eigenvalues are a complete geometric description of the ellipsoid (they give all its axis directions and the reciprocal squared lengths of its semiaxes) and they lend themselves nicely to plotting the ellipsoid as a distorted sphere, as shown by the *Mathematica* code. A good linear algebra text may be helpful. So may a clear account of SVD and/or PCA. For a fully geometric, statistical approach apply [my account of Mahalanobis distance](https://stats.stackexchange.com/a/62147/919) to three dimensions. – whuber Nov 02 '20 at 23:04
  • @whuber Could you please verify if my understanding is correct? Let $\bar x$ and $\Sigma$ be the mean and covariance matrix of my data. Then we are trying to plot the ellipsoid $\{x \in \mathbb R^d | (x-\bar x)^T \Sigma(x-\bar x) \le a\}$ for some $a>0$. The value of $a$ determines the confidence region.[...] – Akira Nov 02 '20 at 23:40
  • [...] We consider the eigen-decomposition $\Sigma = P\Lambda P^{-1} =P\Lambda P^T$. Here $\Lambda$ is the diagonal matrix generated by eigenvalues $\lambda_1, \ldots, \lambda_d$, while each column of $P$ is an eigenvector. Let $y = P^T(x - \bar x)$. Then $(x-\bar x)^T \Sigma(x-\bar x) = y^T \Lambda y = \lambda_1y_1^2+...+\lambda_d y_d^2$. Then $y = P^T(x - \bar x)$ is the rotation of $x - \bar x$ by $P^T$. – Akira Nov 02 '20 at 23:40
  • @whuber Could you please have a check on my 2 below comments? – Akira Nov 03 '20 at 16:45

0 Answers0