15

I am implementing PCA, LDA, and Naive Bayes, for compression and classification respectively (implementing both an LDA for compression and classification).

I have the code written and everything works. What I need to know, for the report, is what the general definition of reconstruction error is.

I can find a lot of math, and uses of it in the literature... but what I really need is a bird's eye view / plain word definition, so I can adapt it to the report.

Vince.Bdn
  • 725
  • 5
  • 12
Chris
  • 296
  • 1
  • 2
  • 10
  • Reconstruction error is the concept that applies (from your list) only to PCA, not to LDA or naive Bayes. Are you asking about what reconstruction error in PCA means, or do you want some "general definition" that would also apply to LDA and naive Bayes? – amoeba Feb 05 '16 at 23:05
  • Do you know both? The report involves both PCA and LDA as pertains to compression of data, so I have to have some kind of answer w.r.t. both PCA and LDA...but not necessarily NB. So, maybe the detailed pca-specific version...and the general idea, so I can apply it to LDA as well as I can. Then, I'd have enough knowledge to search on google more effectively if I run into snags... – Chris Feb 05 '16 at 23:11
  • This question might better get closed because `general definition of reconstruction error` is elusively broad. – ttnphns Feb 10 '16 at 18:04
  • 2
    @ttnphns, I don't think it's too broad. I think the question can be reformulated as "Can we apply the PCA notion of reconstruction error to LDA?" and I think it is an interesting and on-topic question (+1). I will try to write an answer myself if I find time. – amoeba Feb 10 '16 at 18:09
  • 1
    @amoeba, in the formulation suggested by you the question indeed receives light. Yes, it is possible to write an answer then (and I may expect yours will be good). A tricky thing about "what is being reconstructed" in LDA is issue what is being considered as DVs and what IVs in LDA. – ttnphns Feb 10 '16 at 18:18

4 Answers4

9

For PCA what you do is that you project your data on a subset of your input space. Basically, everything holds on this image above: you project data on the subspace with maximum variance. When you reconstruct your data from the projection, you'll get the red points, and the reconstruction error is the sum of the distances from blue to red points: it indeed corresponds to the error you've made by projecting your data on the green line. It can be generalized in any dimension of course!

enter image description here

As pointed out in the comments, it does not seem that simple for LDA and I can't find a proper definition on the internet. Sorry.

Vince.Bdn
  • 725
  • 5
  • 12
  • LDA case is more tricky than that. What would you do in case of 2-dimensional projections? In PCA, two principal axes are orthogonal and form a 2D plane so of course the same idea of reconstruction error applies. But in LDA, two discriminant axes are *not* orthogonal. How exactly are you suggesting to define the reconstruction error then? – amoeba Feb 10 '16 at 17:44
  • I've got two remarks on the answer. 1) Are you saying that your pic 1 shows the true PC1? 2) For LDA and the 2nd pic - well, you can draw discriminants as axes in the original space and call data point residuals "reconstruction error". But it is a loose terminological practice. What do discriminants reconstruct? Also, add here what amoeba said about axial nonorthogonality (seen [here](http://stats.stackexchange.com/a/22889/3277)). – ttnphns Feb 10 '16 at 18:01
  • 1) It's a picture taken from a google search that shows error but indeed the pca would be much more vertical, i'll try to find a better one and update. – Vince.Bdn Feb 10 '16 at 18:05
  • 2) I've edited my post. I tend to see the discriminants as axes in the original space indeed for a geometric point of view but as pointed out there's not orthogonality. My mistake... – Vince.Bdn Feb 10 '16 at 18:13
  • 1
    Vince, it's your decision. But as for me, in your place I'd better left the second pic in the answer, too. You were not mistaken and your view is possible. The issue is, however, more complex with LDA; comments were just to stress _that_. – ttnphns Feb 10 '16 at 18:22
  • I second what @ttnphns said. I'd also suggest you keep the LDA picture but add the *explanation* of why the same notion of reconstruction error cannot be easily applied in this case. – amoeba Feb 10 '16 at 18:27
  • @Vince.Bdn thanks man...this is actually what I ended up doing. The LDA reconstruction never worked. I was under instruction to produce a reconstruction image and error for LDA on the grounds that it is an orthonormal basis, so a reconstruction should be possible...but I think that doesn't factor in the fact that, in practice, you are dealing with a singular matrix with a hefty amount of imaginary component, so all that data gets lost in the pseudo inverse, then dumping the imaginary part in the reconstruction...Actually, it wound up looking exactly like the first eigenvector of the PCA. – Chris Feb 10 '16 at 21:27
  • @Vince.Bdn Anyhow, this answer looks really good, and if I'd stumbled on it last week, it probably would have saved me a couple of hours of googling...get a snack, google, try something out, get a coke... =) – Chris Feb 10 '16 at 21:28
  • @Vince.Bdn also, I think I might post a follow-up question about why, if we are forming orthonormal bases, we don't get a perfect reconstruction every time--is this all related to shortcuts in inverse calculations, and underflow? Because all the data *should* (theoretically) be in the vector that gets output, if: $$ \mathbf{a} = \mathbf{W}\mathbf{v}$$ and $$ \mathbf{v} = \mathbf{a}\mathbf{W^{-1}}$$... or $$ \mathbf{v} = \mathbf{W}\mathbf{v}\mathbf{W^{-1}}$$ – Chris Feb 10 '16 at 21:32
2

The general definition of the reconstruction error would be the distance between the original data point and its projection onto a lower-dimensional subspace (its 'estimate').

Source: Mathematics of Machine Learning Specialization by Imperial College London

2

What I usually use as the measure of reconstruction error (in the context of PCA, but also other methods) is the coefficient of determination $R^2$ and the Root Mean Squared Error (or normalised RMSE). These two are easy to compute and give you a quick idea of what the reconstruction did.

Calculation

Let's assume $X$ is your original data and $f$ is the compressed data.

The $R^2$ of the $i^{th}$ variable can be computed as:

$R^2_i = 1 - \frac{\sum_{j=1}^n (X_{j,i} - f_{j,i})^2}{\sum_{j=1}^n X_{j,i}^2}$

Since $R^2 = 1.0$ for a perfect fit, you can judge the reconstruction by how close the $R^2$ is to 1.0.

The RMSE of the $i^{th}$ variable can be computed as:

$ \text{RMSE}_i = \sqrt{\overline{(X_i - f_i)^2}} $

which you can also normalise by a quantity that suits you (norm $N$), I often normalise by the mean value, the NRMSE is thus:

$\text{NRMSE}_i = \frac{\text{RMSE}_i}{N_i} = \sqrt{\frac{\overline{(X_i - f_i)^2}}{\overline{X_i^2}}}$

Computation

In case you are using Python you can compute these as:

from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from math import sqrt
import numpy as np

r2 = r2_score(X, f)
rmse = sqrt(mean_squared_error(X, f))

# RMSE normalised by mean:
nrmse = rmse/sqrt(np.mean(X**2))

where X is the original data and f is the compressed data.

Visualization

In case it is helpful for you to do some sensitivity analysis you can then judge visually how the $R^2$ or RMSE change when you change parameters of your compression. For instance, this can be handy in the context of PCA when you want to compare reconstructions with increasing number of the retained Principal Components. Below you see that increasing the number of modes is getting your fit closer to the model:

enter image description here

kamilazdybal
  • 672
  • 8
  • 20
-1

Bird eye view for Reconstruction error in the PCA context will be variability of the data which we are not able to capture.

Notation Principle subspace - lower dimensional subspace on which data is projected

Reconstruction error as contribution from ignored subspace

In PCA Reconstruction error or loss is sum of eigen values of the ignored subspace. Lets say you have 10 Dimensional data, and you are selecting first 4 principal components, what this means is your principle subspace has 4 dimensions and corresponds to 4 largest eigen values and respective vectors, So reconstruction error is sum of 6 eigen values of the ignored subspace, (the smallest 6).

Minimizing the reconstruction error means minimizing the contribution of ignored eigenvalues which depends on the distribution of the data and how many components we are selecting.

Reconstruction error as average squared distance

Ignored subspace is orthogonal complement of principal subspace, so reconstruction error can be seen as average squared distance between the original data points and respective projections onto principal subspace as shared in another answer.

T.singh
  • 1
  • 1