1

Consider data $x ∈ R^3$ that is represented in a 3D space, i.e., using 3 coordinates $(x^1, x^2, x^3)$. Consider that we observe a large dataset $x_1, x_2, · · · , x_n$, where n = 100. Suppose each datum $x_i$ is known to be such that its coordinates exhibit a functional relationship such that $ax_1^i + bx_2^i + cx_3^i + d = 0$, for the same $a, b, c, d ∈ R$. Thus, although the data lies in a 3D space, the underlying degrees of freedom in the variability of the data is lower than 3. Your goal is to represent each datum using just as many coordinates as the number of degrees of freedom, without any loss of information, i.e., the mapping from the 3D space to the lower-dimensional space should be linear and distance preserving (Euclidean distance between any 2 points in the original 3D space should equal Euclidean distance between the mapped points in the lower-dimensional space).

• How many degrees of freedom are present in the variability in the dataset ?

• Give a detailed algorithm to find the lower-dimensional representation of the 3D data.

I understand that the algorithm is PCA algorithm. But how to actually find the dimension of the lower subspace using the given constraint?

  • See http://stats.stackexchange.com/questions/5922 – whuber Nov 11 '16 at 15:42
  • @whuber - Please tell me whether I am on the right track: I have to compute the eigen values and fix a minimum threshold, and then discard the eigen vectors which have eigen values less than the threshold, and then the resultant dimension will be the new degree of freedom? – Shraddheya Shendre Nov 11 '16 at 21:15
  • That's correct. – whuber Nov 11 '16 at 21:20

0 Answers0