1

I am trying to get a feel for which DR scheme is most suitable for my problem, with the stipulation that the scheme is bijectiv; i.e. there is a two-way map between the low-dimensional manifold and the original space.

EDIT:

For clarity: is it possible to select your dimensional reduction so that you can project UP from the low dimensional space that you have found, such that information loss is acceptable. E.g. say I have a high-dimensional joint configuration of some character, and I want to find its low-dimensional joint projection and then traverse that manifold, or sample it, and visualise those samples in the original space.

Astrid
  • 745
  • 7
  • 17
  • 1
    I do not see how a dimension reduction scheme could be bijective in this sense. (In essence this is part of the definition of *dimensionality* of a space) However in principle *any* dimension reduction scheme could be bijective in the sense of *preserving the data*. For example if the data lie exactly on a hyperplane, then PCA would be able to preserve the data while dropping some eigenvectors. – GeoMatt22 Sep 09 '16 at 16:48
  • 3
    (My comment was posted before your edit.) Note that if PCA is used as a dimension *reduction* scheme, then the reduced basis-set will no longer map to the *entire* original space. If you keep all the eigenvectors, then PCA is just a rotation of the coordinate axes in the original space, but does not *reduce* the number of dimensions! – GeoMatt22 Sep 09 '16 at 16:50
  • Well no they can be, you will have information loss of course, but you should still just be able to map it back. – Astrid Sep 09 '16 at 16:51
  • @GeoMatt22; sure but if you ditch all but those representing 90% of the variance, you will get a dimensionally reduced manifold. – Astrid Sep 09 '16 at 16:52
  • 2
    I am not sure what you are asking. It sounds like it may be more "what alternative coordinate systems for $\mathbb{R}^n$ are bijective?". For that question, the answer is: [those with an invertible Jacobian](https://en.wikipedia.org/wiki/Curvilinear_coordinates#Jacobian_of_the_transformation). – GeoMatt22 Sep 09 '16 at 16:58
  • @GeoMatt22 it may be true that I have not posed this well. Let me try again; is it possible to select your dimensional reduction so that you can project UP from the low dimensional space that you have found, such that information loss is acceptable. Say I have a high-dimensional joint configuration of some character, and I want to find its low-dimensional joint projection and then traverse that manifold, or sample it, and visualise those samples in the original space. – Astrid Sep 09 '16 at 17:05
  • 2
    You might find [this question](http://stats.stackexchange.com/questions/115207/how-to-prove-that-the-manifold-assumption-is-correct) helpful. BTW I found that by searching on the [manifold learning](http://stats.stackexchange.com/questions/tagged/manifold-learning) tag, which you might consider adding to your question. – GeoMatt22 Sep 09 '16 at 17:12

3 Answers3

3

So in general your question is a bit weird, dimensionality reduction is related to projection which, as the name says, puts something onto something but many things can be put on the same thing. (Think about a point in 3D that is projected on a surface then think about the perpendicular line to the surface passing through that point. every point on that line is projected onto the same point)

Ok let's consider PCA which you claim is bijective (it really isn't), actually consider SVD since PCA is really just SVD.

Full SVD is:

$ M = U\Lambda V^T$

when you consider dimensionality reduction, you only consider the first largest elements of the $\Lambda$, a diagonal matrix, hence the representation

$\hat M = U\hat\Lambda V^T$

now this representation can correspond to an infinite number of matrices $M$. In other words, you can build an infinite number of matrices $M'$ such that, when applying the same process of SVD truncation, you get $\hat M$ (so it's really not bijective)

Indeed: let

$M' = U \Lambda' V^T$

where $\Lambda'$ has the same first few singular values than $\Lambda$ and all the others are different (but smaller). The SVD truncation of all these $M'$ will give you $\hat M$.

tibL
  • 296
  • 1
  • 6
2

If you think of dimensionality reduction only through linear transformations $T: V \rightarrow U$, say $V = \mathbb{R}^n$ and $U \subset V$, the transformation can be described by a $n \times n$ matrix but it is not full rank as $\text{dim}(\text{im}(T)) = \text{dim}(U) < \text{dim}(V)$. The matrix is not invertible and there is no one-to-one mapping. Also by rank nullity theorem, $\text{dim}(\text{ker}(T)) = \text{dim}(V) - \text{dim}(\text{im}(T)) \geq 1$. So there is at least one line in the space that is mapped to 0 with this dimensionality reducing transformation, and the map is clearly not bijective.

Iron4dam
  • 41
  • 4
2

An autoencoder might be what you are after. The idea is to define two functions, an encoder $f_{\theta}: \mathbb{R}^N\to \mathbb{R}^n$ and a decoder $g_{\phi}: \mathbb{R}^n\to\mathbb{R}^N$, where $N$ is the dimensionality of your data and $n<<N$. It is common to parametrize $f$ and $g$ by neural networks. Then each datum $X$ can be approximately reconstructed by $g_{\phi}(f_{\theta}(X))$, and the parameters $\phi,\theta$ are trained so as to minimize the average reconstruction loss.

Given any point in the latent space $\mathbb{R}^n$, you can simply run it throug the decoder $g_{\phi}$ to get back a point in the original space.

Mike Hawk
  • 1,094
  • 5
  • 13