0

The following example is from chapter 2.12 of Deep Learning, by Goodfellow, Bengio, and Courville:

enter image description here enter image description here

I don't understand what is meant by the very last part: "Many solutions are possible, because we can increase the scale of $\mathbf{D}_{:, i}$ if we decrease $c^i$ proportionally for all points."

I would appreciate it if people could please take the time to clarify this

L.V.Rao
  • 1,909
  • 15
  • 23
The Pointer
  • 1,064
  • 13
  • 35
  • 2
    If you multiple `c` by 10, you would also divide `D` by 10. That would give another solution. Think like 100 = a * b. You will have many solutions unless you constrain a and b. – SmallChess Jul 11 '18 at 06:16
  • @SmallChess I understand now. Thanks for the clarification. – The Pointer Jul 11 '18 at 06:25

1 Answers1

2

PCA (please go here, read and upvote) takes a cloud of points in high-dimensional space and describes it in lower-dimensional space. In the two-dimensional case as in amoeba's excellent illustration here (have you upvoted it yet?), this means that the first step is describing how the cloud is oriented. That is, we first find the angle at which the cloud can be "best" described as lying. This "angle-finding" is what $D$ does.

Having achieved this, we have to describe where along that angle the points lie (roughly, since we are approximating in order to compress information). And here, we have a choice of scale. We can measure how far out the line a given point lies in centimeters (so we might get a coordinate of 5.08) or in inches (getting a coordinate of 2). To resolve this, PCA decides that the angular scale must be "similar" to the original scale of the high-dimensional space we are working on; specifically that the matrix $D$ have unit length columns.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357