Tutorial for feature extraction on unsupervised learning

Question

I would like to extract features from (without loss of generality) numerical data using unsupervised learning methods among these:

transformations: PCA/ICA/NMF
embeddings: T-distributed stochastic neighbor embedding.
cluster based methods: k-means or similar
kernel based: kernel PCA

I also think about using auto-encoders or similar. The extracted feature are then used in a classifier.

My question: I study each of these methods one by one. Some in the original context (e.g. clustering) and some in the context of feature extraction. I lack experience on the details and many questions arise as

Can I stack these methods? What do I lose?
Can I apply them on subsets (to reduce training time) of the data and predict on the rest?

Thus:

Are there tutorials/lecture notes/blog posts on the web that describe best practice of feature extraction in this sense?

PS: Courses like this Week 4: Feature construction deal with my question - I would love to see more examples from an applied point of view. This question Tutorials for feature engineering is similar but I hope mine is not a duplicate.

score 1 · Accepted Answer · answered Jan 18 '17 at 21:47

A nice reference is Dimensionality Reduction A Short Tutorial by Ali Ghodsi. It covers PCA, Locally Linear Embedding, Multidimensional Scaling and Isomap.

Dan Ventura provides us with some nice worked examples of Manifold Learning - specifically, PCA, LLE and ISOMAP

Kilian Weinberger has a nice web page devoted to Manifold Learning

There is a high-level overview of Feature Engineering at Machine Learning Mastery that also has some references.

Lawrence Cayton has an overview paper on Algorithms for Manifold Learning

Even though it is mostly about supervised feature extraction, I hate to omit mention of the work of Isabelle Guyon. She has a nice paper An Introduction to Variable and Feature Selection slides from a KDD Tutorial and her book on Feature Extraction.

All links checked as of 18 Jan 2017

score 0 · Answer 2 · answered Jan 16 '17 at 15:49

0

There is definitely a lot of blogs and explorations and tutorials out there, unfortunately I don't know any. IF you want explanation check Udacity georgia techs ML course, they have a section about PCA/ICA(maybe NMF)

You can stack them. Sometimes PCA is run to reduce dimensions so then ICA/NMF doesn't have to as much work. I suppose to run first ICA/NMF and then PCA wouldn't really make much sense.

Yes you can train them only on a subset of the data. It should probably have predict and fit functions implemented (or something similar)

answered Jan 16 '17 at 15:49

rep_ho

6,036
1
22
44

Thanks but this covers the basics only. I know about PCA and ICA (ICA assumes uncorrelated features or PCA first) ... In this very case I can predict too but it does not take a lot of time to run the analysis on a large data set. the question is more subtle with kernel PCA (takes long) or TSNE (no predict). – Richi W Jan 16 '17 at 15:52

score 0 · Answer 3 · edited Apr 13 '17 at 12:44

0

I personally think feature extraction on unsupervised learning is not well defined. If there is no ground truth label in the data, what's the goal of feature extraction, i.e., how do we know the derived feature is good or bad?

We can have finite ways to derive new features from data but will not know if new features are good. Classical way of doing feature eingineering includes basis expansion, where we can select different basis, e.g., Polynominal basis, Fourier basis, etc..

Examples can be found here:

What's wrong to fit periodic data with polynomials?

edited Apr 13 '17 at 12:44

Community

1

answered Jan 18 '17 at 16:42

Haitao Du

32,885
17
118
213

I agree that a supervised way would be preferable. I know deep nets (of course) and partial least squares: do you know any other methods of this kind? We could add FDA. – Richi W Jan 19 '17 at 08:27

Tutorial for feature extraction on unsupervised learning

3 Answers3