33

In dimensionality reduction technique such as Principal Component Analysis, LDA etc often the term manifold is used. What is a manifold in non-technical term? If a point $x$ belongs to a sphere whose dimension I want to reduce, and if there is a noise $y$ and $x$ and $y$ are uncorrelated, then the actual points $x$ would be far separated from each other due to the noise. Therefore, noise filtering would be required. So, dimension reduction would be performed on $z = x+y$. Therefore, over here does $x$ and $y$ belong to different manifolds?

I am working on point cloud data that is often used in robot vision; the point clouds are noisy due to noise in acquisition and I need to reduce the noise before dimension reduction. Otherwise, I will get incorrect dimension reduction. So, what is the manifold here and is noise a part of the same manifold to which $x$ belongs?

Ria George
  • 1,375
  • 2
  • 14
  • 31

4 Answers4

47

In non technical terms, a manifold is a continuous geometrical structure having finite dimension : a line, a curve, a plane, a surface, a sphere, a ball, a cylinder, a torus, a "blob"... something like this : enter image description here

It is a generic term used by mathematicians to say "a curve" (dimension 1) or "surface" (dimension 2), or a 3D object (dimension 3)... for any possible finite dimension $n$. A one dimensional manifold is simply a curve (line, circle...). A two dimensional manifold is simply a surface (plane, sphere, torus, cylinder...). A three dimensional manifold is a "full object" (ball, full cube, the 3D space around us...).

A manifold is often described by an equation : the set of points $(x,y)$ such as $x^2+y^2=1$ is a one dimensional manifold (a circle).

A manifold has the same dimension everywhere. For example, if you append a line (dimension 1) to a sphere (dimension 2) then the resulting geometrical structure is not a manifold.

Unlike the more general notions of metric space or topological space also intended to describe our natural intuition of a continuous set of points, a manifold is intended to be something locally simple: like a finite dimension vector space : $\mathbb{R}^n$. This rules out abstract spaces (like infinite dimension spaces) that often fail to have a geometric concrete meaning.

Unlike a vector space, manifolds can have various shapes. Some manifolds can be easily visualized (sphere ,ball...), some are difficult to visualize, like the Klein bottle or the real projective plane.

In statistics, machine learning, or applied maths generally, the word "manifold" is often used to say "like a linear subspace" but possibly curved. Anytime you write a linear equation like : $3x+2y-4z=1$ you get a linear (affine) subspace (here a plane). Usually, when the equation is non linear like $x^2+2y^2+3z^2=7$, this is a manifold (here a stretched sphere).

For example the "manifold hypothesis" of ML says "high dimensional data are points in a low dimensional manifold with high dimensional noise added". You can imagine points of a 1D circle with some 2D noise added. While the points are not exactly on the circle, they satisfy statistically the equation $x^2+y^2=1$. The circle is the underlying manifold: https://i.stack.imgur.com/iEm2m.png

Benoit Sanchez
  • 7,377
  • 21
  • 43
  • Thank you for your answer supported with a simple example and the picture. Can you please explain what a topology is in non-technical term as well? Is the term topology and manifold used interchangeably? Does the dimension has to be an integer number? What is it is a real number, then I think the structure is known as fractals if the entire structure is composed of each subpart is self-repeating. Lastly, in the picture there are three empty parts, hollow without the green material still why it is a continuous geometric structure? – Ria George Jul 08 '17 at 18:30
  • Then, can you please include the case of noise that I asked in my question? In a point cloud data, say there are outliers (noise). Would the noise be considered part of the same manifold as the non-noise data? – Ria George Jul 08 '17 at 18:38
  • 4
    @RiaGeorge In the picture it is the *surface* that is a manifold. It's continuous because you can move around it freely without interruption and never have to jump *off* the surface to get between any two places. The holes you allude to are important in describing *how* you can get around on the surface between any two points in the simplest way, and counting them is an important technique in studying manifolds. – Matthew Drury Jul 08 '17 at 18:58
  • 4
    Explaining what topology is would be way too broad a question for this site, and a bit off topic. I would search the mathematics stack exchange for information on that. Manifolds and topology are not synonyms: manifolds are mathematical objects studied with the techniques of topology, topology is a sub-subject of mathematics. – Matthew Drury Jul 08 '17 at 19:01
  • 1
    This seems like a very good explanation to someone learning about the concept for the first time, with well-chosen, concrete examples. (I don't know for certain though since I have encountered the concept before.) As a minor quibble, I would recommend rephrasing the last sentence to be less absolute ("Anytime the equation is non-linear like..."): as it is written right now, it is not actually true. Apart from that minor quibble, I find this very well-written. – Chill2Macht Jul 08 '17 at 19:28
  • 1
    The answer misses all the fundamental points that make a manifold such, I don't get how it has so many upvotes. Topology, charts and smoothness are not even mentioned and the answer basically gives the impression that a manifold is a surface, which it is **not**. – gented Jul 08 '17 at 23:36
  • 2
    Technical point, the solution set of a system of equations need not be a manifold. It's a variety, so it's mostly a manifold, but it can have points of self intersection where the manifold property fails. – Matt Samuel Jul 09 '17 at 00:03
  • 1
    Your manifold definition includes the requirement to be of *finite dimension*. But you include examples that do not meet that requirement—such as lines, planes, curves and surfaces. Could you please clarify what you meant? – Mowzer Jul 09 '17 at 15:21
  • 1
    @Mowzer: Finite dimensionality means that the number of dimensions is finite. For instance, a plane is a 2-dimensional object, and 2 is finite. –  Jul 09 '17 at 23:16
  • 1
    @RiaGeorge : About "topology", the question is interesting, but maybe the site "Mathematics" (https://math.stackexchange.com/) would be a better place. And like Matthew Dury says, it's very broad... – Benoit Sanchez Jul 10 '17 at 08:30
  • @MattSamuel Technically that is true only when the equations are algebraic. For general non-linear equations (based on differentiable functions) the essence of your conclusion still does hold, since by the inverse/implicit function theorem in a neighborhood of any non-singular point the solution set will be a manifold, and by Sard's theorem non-singular points are generic. (But to the best of my knowledge the term variety is only appropriate for solution sets of algebraic equations.) – Chill2Macht Jul 10 '17 at 08:37
  • 1
    @Mowzer You may be confusing (set) cardinality https://en.wikipedia.org/wiki/Cardinality with (topological) dimension https://en.wikipedia.org/wiki/Lebesgue_covering_dimension For manifolds the dimension is simpler to understand, it is just the value of $n$ for which there are local homeomorphisms with $\mathbb{R}^n$, as a corollary of Brouwer's invariance of domain theorem, the dimension of manifolds is a topological invariant. – Chill2Macht Jul 10 '17 at 08:46
  • @Chill It's still a variety, just not an algebraic one. If it's not algebraic it's even easier for it to fail to be a manifold. – Matt Samuel Jul 10 '17 at 11:27
  • @MattSamuel would you mind providing a reference for this usage of the word variety? I have never heard it before and am curious. – Chill2Macht Jul 10 '17 at 13:25
  • 1
    @Chill It's a very broad term that means the solution set of a system of equations. Generally what is meant is an algebraic variety implicitly, but there are other examples. See https://en.wikipedia.org/wiki/Complex-analytic_variety and https://en.wikipedia.org/wiki/Variety_%28universal_algebra%29 – Matt Samuel Jul 10 '17 at 13:30
  • 1
    @Chill For the zero set of arbitrary smooth functions it's not really an interesting concept because it can be any closed set. – Matt Samuel Jul 10 '17 at 13:33
15

A (topological) manifold is a space $M$ which is:

(1) "locally" "equivalent" to $\mathbb{R}^n$ for some $n$.

"Locally", the "equivalence" can be expressed via $n$ coordinate functions, $c_i: M \to \mathbb{R}$, which together form a "structure-preserving" function, $c: M \to \mathbb{R}^n$, called a chart.

(2) can be realized in a "structure-preserving" way as a subset of $\mathbb{R}^N$ for some $N \ge n$. (1)(2)

Note that in order to make "structure" precise here, one needs to understand basic notions of topology (def.), which allows one to make precise notions of "local" behavior, and thus "locally" above. When I say "equivalent", I mean equivalent topological structure (homeomorphic), and when I say "structure-preserving" I mean the same thing (creates an equivalent topological structure).

Note also that in order to do calculus on manifolds, one needs an additional condition which doesn't follow from the above two conditions, which basically says something like "the charts are well-behaved enough to allow us to do calculus". These are the manifolds most often used in practice. Unlike general topological manifolds, in addition to calculus they also allow triangulations, which is very important in applications like yours involving point cloud data.

Note that not all people use the same definition for a (topological) manifold. Several authors will define it as satisfying only condition (1) above, not necessarily also (2). However, the definition which satisfies both (1) and (2) is much better behaved, therefore more useful for practitioners. One might expect intuitively that (1) implies (2), but it actually doesn't.

EDIT: If you are interested in learning about what precisely a "topology" is, the most important example of a topology to understand is the Euclidean topology of $\mathbb{R}^n$. This will be covered in-depth in any (good) introductory book about "real analysis".

Chill2Macht
  • 5,639
  • 4
  • 25
  • 51
  • Thank you for your answer: Can you please explain what a topology is in nnon-technical term as well? Is the term topology and manifold used interchangeably? Does the dimension has to be an integer number? What is it is a real number, then I think the structure is known as fractals if the entire structure is composed of each subpart is self-repeating. – Ria George Jul 08 '17 at 18:27
  • 1
    @RiaGeorge $n$ stands for a natural number (integer $\ge 1$), as does $N$. There might be more advanced theory for fractional/real-valued dimensions, but it doesn't come up as often. "Topology" and "manifold" mean two very distinct things, so they are not interchangeable terms. A "manifold" has a "topology". The field of Topology studies spaces which have "topologies", which are collections of sets satisfying three rules/conditions. One goal of studying "topologies" is to describe in a consistent and reproducible way notions of "local" behavior. – Chill2Macht Jul 08 '17 at 19:05
  • @RiaGeorge The axioms for a "topology" can be found on the Wikipedia page: https://en.wikipedia.org/wiki/General_topology#A_topology_on_a_set -- note also that the link I gave you for the (equivalent) definition of "topology" in terms of neighborhood pointed to something related but not the same, I have edited my answer to reflect this: https://en.wikipedia.org/wiki/Neighbourhood_(mathematics)#Topology_from_neighbourhoods Note however that the definition in terms of neighborhoods is more difficult to understand (I imagine I could understand it well, but I don't bother too, because I'm lazy – Chill2Macht Jul 08 '17 at 19:08
  • so anyway it's my personal biased opinion that you don't need to know the neighborhood definition of topology -- just know that the simpler definition gives you all of the same power of the neighborhood definition in terms of rigorously describing local behavior, since they are equivalent). Anyway, if you are interested in fractals, maybe you will find these Wikipedia pages interesting -- I can't help you out with that more though, because I am not deeply familiar with the theory and don't know or understand most of the definitions -- I have only heard of some of the – Chill2Macht Jul 08 '17 at 19:10
  • relevant terms. https://en.wikipedia.org/wiki/Hausdorff_dimension https://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension That being said, the relationship between fractals and manifolds seems to be complicated at best, although I don't know or understand much about fractals. https://math.stackexchange.com/questions/1340973/can-a-fractal-be-a-manifold-if-so-will-its-boundary-if-exists-be-strictly-on https://math.stackexchange.com/questions/1406058/can-a-fractal-be-a-manifold One problem is that Hausdorff dimension seems to be defined for spaces more general than manifolds. – Chill2Macht Jul 08 '17 at 19:13
  • 2
    This is the only answer so far that pays attention to the modern mathematical idea of assembling a global object from local data. Unfortunately, it doesn't quite make it to the level of simplicity and clarity required of a "non-technical" account. – whuber Jul 11 '17 at 13:47
10

In this context, the term manifold is accurate, but is unnecessarily highfalutin. Technically, a manifold is any space (set of points with a topology) that is sufficiently smooth and continuous (in a way that can, with some effort, be made mathematically well-defined).

Imagine the space of all possible values of your original factors. After a dimensional reduction technique, not all points in that space are attainable. Instead, only points on some embedded sub-space inside in that space will be attainable. That embedded sub-space happens to fulfill the mathematical definition of a manifold. For a linear dimensional reduction technique like PCA, that sub-space is just a linear sub-space (e.g. a hyper-plane), which is a relatively trivial manifold. But for non-linear dimensional reduction technique, that sub-space could be more complicated (e.g. a curved hyper-surface). For data analysis purposes, understanding that these are sub-spaces is much more important than any inference you would draw from knowing that they fulfill the definition of manifold.

David Wright
  • 2,181
  • 12
  • 12
  • 4
    "Highfalutin"... learned a new word today! – user541686 Jul 08 '17 at 08:53
  • 5
    *Mathematically, a manifold is any locally continuous topological space.* I like the idea of trying to explain things in plain language, but this characterization really doesn't work. First off, continuity is always a local property, so I'm not sure what you mean by locally continuous. Also, your definition fails to rule out a lot of things that aren't manifolds, such as the rational number line, or the union of two intersecting lines in the Euclidean plane. –  Jul 08 '17 at 16:20
  • 4
    I agree with Ben, technically it's "locally euclidean". I'm not sure there is a good way to boil that down to simple english. – Matthew Drury Jul 08 '17 at 19:00
  • 1
    I also have to agree strongly with the two comments above. In fact, the answer I wrote below was originally meant to be a clarifying comment to this answer which became too long. There is no precise notion of a "continuous" topological space (see here: https://math.stackexchange.com/questions/1822769/how-to-axiomize-the-notion-of-continuous-space ). Defining manifolds in terms of non-existent concepts is, in my opinion, in the long-run more likely to be confusing than clarifying. At the very least, I would suggest replacing the word "mathematically" in the first sentence with something else. – Chill2Macht Jul 08 '17 at 19:34
  • I'll use this comment as an opportunity to ask a little question...I (think) I got the idea of manifolds, but why is it "locally" needed? Isn't a space "locally" continuous...continuous as a whole? – Paul92 Jul 08 '17 at 23:27
  • @BenCrowell and others: I agree with your objections. Since the whole point of my answer is "the details of the mathematical definition aren't important for the usage in data science", I'm not too worried about its admitted deficiencies. But since you are, and since someone who does want the mathematical definition might stumble onto this post, I've edited to improve it a bit. Note that I'm well-aware of the usual attempted to summarize the definition as "locally like ${\mathbb R}^n$; I find it neither mathematically nor colloquially enlightening. – David Wright Jul 10 '17 at 21:21
  • @DavidWright: Thank you for your answer and the edited updates. In your answer, you have mentioned the example of PCA I have a doubt which is in many cases the original data contains outliers or noise. So, we do data processing before dimension reduction. However, there may be other examples where noise is not filtered and we do dimension reduction using PCA. Would noise be considered to be part of the same manifold to which the data points belong? Can you provide some insights into this? – Ria George Jul 17 '17 at 04:12
  • @RiaGeorge: Once you do dimensional reduction, the original points are typically not in the reduced space. But if the reduction is good, they are _close to_ the reduced space. You could regard their displacement from the reduced space as noise; you would essentially be saying that the reduced space is the underlying available space and any displacements from it are small random errors. But whether you call that displacement noise or not, the original points will generally not be exactly on the reduced manifold. – David Wright Jul 17 '17 at 18:01
0

As Bronstein and others have put it in Geometric deep learning: going beyond Euclidean data (Read the article here)

Roughly, a manifold is a space that is locally Euclidean. One of the simplest examples is a spherical surface modeling our planet: around a point, it seems to be planar, which has led generations of people to believe in the flatness of the Earth. Formally speaking, a (differentiable) d-dimensional manifold X is a topological space where each point x has a neighborhood that is topologically equivalent (homeomorphic) to a d-dimensional Euclidean space, called the tangent space.

Gonçalo Peres
  • 165
  • 1
  • 1
  • 11
  • The quotation is self-contradictory. At the outset it describes a Riemannian manifold ("locally Euclidean") but at the end it describes a topological manifold (homeomorphisms do not, by definition, have to respect the differential structure and therefore the concept of tangent space does not apply). – whuber Feb 09 '21 at 13:13