The manifold hypothesis (just an intuition…)

Deep learning makes a big assumption, which is that the input data lives on an underlying non-Euclidean structured called a manifold.

In this manifold there are subspaces where the same type of objects exists. For example in a certain subspace there may be only photos of mountains.

In the autoencoder architecture, the decoder performs a mapping from a low-dimensional latent space to an high-dimensional embedding space of observed data.

The latent space is Euclidean, while the embedding space is curved (not Euclidean).

Manifolds

Todo

This has to be REVIEWED

A manifold can be seen as a union of charts (atlas).

A chart is a mapping with .

An example of a chart is the mapping that happen when we want to represent the earth on a flat surface.

  • The domain of (the space in which the flat surface is) is the parametric space (Euclidean);
  • The codomain of (the sphere that represents the earth) is the embedding (not Euclidean).

has to be:

  • Smooth: if two points are close in the parametric space, then they have to be close also in the embedding. The distance may be different, but the proximity has to be maintained. Being smooth means to be continuous and differentiable.
  • Invertible

A function that is smooth and invertible takes the name of diffeomorphism.

For each manifold, we have infinitely many ways to construct a chart.

Back to the cartography example, we can represent the earth as a map that better maintains the distances, or maybe the areas or the angles. Each time we will have a different chart, but they encode the same exact geometric information.

Relationship between Manifolds, Autoencoders and PCA

The decoder is a chart that maps the latent space spanned by the codes to the data space of the inputs .

It’s both differentiable and continuous (smooth) and it’s invertible via the encoder , so it’s in all and for all a chart.

Finding the map is achieved by training an autoencoder.