Throughout this post we shall always work in the smooth category, thus all manifolds, maps, coordinate charts, and functions are assumed to be smooth unless explicitly stated otherwise.

A (real) manifold can be defined in at least two ways. On one hand, one can define the manifold *extrinsically* , as a subset of some standard space such as a Euclidean space . On the other hand, one can define the manifold *intrinsically* , as a topological space equipped with an atlas of coordinate charts. The fundamental *embedding theorems* show that, under reasonable assumptions, the intrinsic and extrinsic approaches give the same classes of manifolds (up to isomorphism in various categories). For instance, we have the following (special case of) the Whitney embedding theorem :

**Theorem 1 (Whitney embedding theorem)** Let be a compact manifold. Then there exists an embedding from to a Euclidean space .

In fact, if is -dimensional, one can take to equal , which is often best possible (easy examples include the circle which embeds into but not , or the Klein bottle that embeds into but not ). One can also relax the compactness hypothesis on to second countability, but we will not pursue this extension here. We give a “cheap” proof of this theorem below the fold which allows one to take equal to .

A significant strengthening of the Whitney embedding theorem is (a special case of) the Nash embedding theorem :

**Theorem 2 (Nash embedding theorem)** Let be a compact *Riemannian* manifold. Then there exists a isometric embedding from to a Euclidean space .

In order to obtain the isometric embedding, the dimension has to be a bit larger than what is needed for the Whitney embedding theorem; in this article of Gunther the boundis attained, which I believe is still the record for large . (In the converse direction, one cannot do better than , basically because this is the number of degrees of freedom in the Riemannian metric .) Nash’s original proof of theorem used what is now known as Nash-Moser inverse function theorem , but a subsequent simplification of Gunther allowed one to proceed using just the ordinary inverse function theorem (in Banach spaces).

I recently had the need to invoke the Nash embedding theorem to establish a blowup result for a nonlinear wave equation, which motivated me to go through the proof of the theorem more carefully. Below the fold I give a proof of the theorem that does not attempt to give an optimal value of , but which hopefully isolates the main ideas of the argument (as simplified by Gunther). One advantage of not optimising in is that it allows one to freely exploit the very useful tool of *pairing* together two maps , to form a combined map that can be closer to an embedding or an isometric embedding than the original maps . This lets one perform a “divide and conquer” strategy in which one first starts with the simpler problem of constructing some “partial” embeddings of and then pairs them together to form a “better” embedding.

In preparing these notes, I found the articles of Deane Yang and of Siyuan Lu to be helpful.

**— 1. The Whitney embedding theorem —**

To prove the Whitney embedding theorem, we first prove a weaker version in which the embedding is replaced by an immersion :

**Theorem 3 (Weak Whitney embedding theorem)** Let be a compact manifold. Then there exists an immersion from to a Euclidean space .

*Proof:* Our objective is to construct a map such that the derivatives are linearly independent in for each . For any given point , we have a coordinate chart from some neighbourhood of to . If we set to be multiplied by a suitable cutoff function supported near , we see that is an immersion in a neighbourhood of . Pairing together finitely many of the and using compactness, we obtain the claim.

Now we upgrade the immersion from the above theorem to an embedding by further use of pairing. Let be an immersion and be points in . If , then is injective in a neighbourhood of . If instead , then it is possible that , but by pairing with some scalar function that separates and , we can replace by another immersion (in one higher dimension ) such that a neighbourhood of and a neighbourhood of get mapped to disjoint sets. Repeating these procedures finitely many times, using the compactness of , we end up with an immersion which is injective, giving the Whitney embedding theorem.

At present, the embedding of an -dimensional compact manifold could be extremely high dimensional. However, if , then it is possible to project from to by the random projection trick (discussed inthis previous post). Indeed, if one picks a random element of the unit sphere, and then lets be the (random) orthogonal projection to the hyperplane orthogonal to , then it is geometrically obvious that will remain an embedding unless either is of the form for some distinct , or lies in the tangent plane to at for some . But the set of all such excluded is of dimension at most (using, for instance, the Hausdorff notion of dimension), and so for almost every in will avoid this set. Thus one can use these projections to cut the dimension down by one for ; iterating this observation we can end up with the final value of for the Whitney embedding theorem.

**Remark 4** The Whitney embedding theorem for is more difficult to prove. Using the random projection trick, one can arrive at an immersion which is injective except at a finite number of “double points” where meets itself transversally (think of projecting a knot in randomly down to ). One then needs to “push” the double points out of existence using a device known as the “Whitney trick”.

**— 2. Reduction to a local isometric embedding theorem —**

We now begin the proof of the Nash embedding theorem. In this section we make a series of reductions that reduce the “global” problem of isometric embedding a compact manifold to a “local” problem of turning a near-isometric embedding of a torus into a true isometric embedding.

We first make a convenient (though not absolutely necessary) reduction: in order to prove Theorem, it suffices to do so in the case when is a torus (equipped with some metric which is not necessarily flat). Indeed, if is not a torus, we can use the Whitney embedding theorem to embed (non-isometrically) into some Euclidean space , which by rescaling and then quotienting out by lets one assume without loss of generality that is some submanifold of a torus equipped with some metric . One can then use a smooth version of the Tietze extension theorem to extend the metric smoothly from to all of ; this extended metric will remain positive definite in some neighbourhood of , so by using a suitable (smooth) partition of unity and taking a convex combination of with the flat metric on , one can find another extension of to that remains positive definite (and symmetric) on all of , giving rise to a Riemannian torus . Any isometric embedding of this torus into will induce an isometric embedding of the original manifold , completing the reduction.

The main advantage of this reduction to the torus case is that it gives us a global system of (periodic) coordinates on , so that we no longer need to work with local coordinate charts. Also, one can easily use Fourier analysis on the torus to verify the ellipticity properties of the Laplacian that we will need later in the proof. These are however fairly minor conveniences, and it would not be difficult to continue the argument below without having first reduced to the torus case.

Henceforth our manifold is assumed to be the torus equipped with a Riemannian metric , where the indices run from to . Our task is to find an injective map which is isometric in the sense that it obeys the system of partial differential equations

for , where denotes the usual dot product on . Let us write this equation as

where is the symmetric tensor

The operator is a nonlinear differential operator, but it behaves very well with respect to pairing:We can useto obtain a number of very useful reductions (at the cost of worsening the eventual value of , which as stated in the introduction we will not be attempting to optimise). First we claim that we can drop the injectivity requirement on , that is to say it suffices to show that every Riemannian metric on is of the form for some map into some Euclidean space . Indeed, suppose that this were the case. Let be any (not necessarily isometric) embedding (the existence of which is guaranteed by the Whitney embedding theorem; alternatively, one can use the usual exponential map to embed into ). For small enough, the map is short in the sense that pointwise in the sense of symmetric tensors (or equivalently, the map is a contraction from to ). For such an , we can write for some Riemannian metric . If we then write for some (not necessarily injective) map , then fromwe see that ; since inherits its injectivity from the component map , this gives the desired isometric embedding.

Call a metric on *good* if it is of the form for some map into a Euclidean space . Our task is now to show that every metric is good; the relationtells us that the sum of any two good metrics is good.

In order to make the local theory work later, it will be convenient to introduce the following notion: a map is said to be *free* if, for every point , the vectors , and the vectors , are all linearly independent; equivalently, given a further map , there are no dependencies whatsoever between the scalar functions , and , . Clearly, a free map into is only possible for , and this explains the bulk of the formulaof the best known value of .

For any natural number , the “Veronese embedding” defined by

can easily be verified to be free. From this, one can construct a free map by starting with an arbitrary immersion and composing it with the Veronese embedding (the fact that the composition is free will follow after several applications of the chain rule).

Given a Riemannian metric , one can find a free map which is *short* in the sense that , by taking an arbitrary free map and scaling it down by some small scaling factor . This gives us a decomposition

for some Riemannian metric .

The metric is clearly good, so byit would suffice to show that is good. What is easy to show is that is *approximately good* :

**Proposition 5** Let be a Riemannian metric on . Then there exists a smooth symmetric tensor on with the property that is good for every .

*Proof:* Roughly speaking, the idea here is to use “tightly wound spirals” to capture various “rank one” components of the metric , the point being that if a map “oscillates” at some high frequency with some “amplitude” , then is approximately equal to the rank one tensor . The argument here is related to the technique of *convex integration* , which among other things leads to one way to establish the of Gromov.

By the spectral theorem, every positive definite tensor can be written as a positive linear combination of symmetric rank one tensors for some vector . By adding some additional rank one tensors if necessary, one can make this decomposition stable, in the sense that any nearby tensor is also a positive linear combination of the . One can think of as the gradient of some linear function . Using compactness and a smooth partition of unity, one can then arrive at a decomposition

for some finite , some smooth scalar functions (one can take to be linear functions on small coordinate charts, and to basically be cutoffs to these charts).

For any and , consider the “spiral” map defined by

Direct computation shows that

and the claim follows by summing in (using) and taking .

The claim then reduces to the following local (perturbative) statement, that shows that the property of being good is stable around a free map:

**Theorem 6 (Local embedding)** Let be a free map. Then is good for all symmetric tensors sufficiently close to zero in the topology.

Indeed, assuming Theorem, and with as in Proposition, we have good for small enough. Byand Proposition, we then have good, as required.

The remaining task is to prove Theorem. This is a problem in perturbative PDE, to which we now turn.

**— 3. Proof of local embedding —**

We are given a free map and a small tensor . It will suffice to find a perturbation of that solves the PDE

We can expand the left-hand side and cancel off to write this aswhere the symmetric tensor-valued first-order linear operator is defined (in terms of the fixed free map ) as

To exploit the free nature of , we would like to write the operator in terms of the inner products and . After some rearranging using the product rule, we arrive at the representation

Among other things, this allows for a way to right-invert the underdetermined linear operator . As is free, we can use Cramer’s rule to find smooth maps for (with ) that is dual to in the sense that

where denotes the Kronecker delta. If one then defines the linear zeroth-order operator from symmetric tensors to maps by the formula

then direct computation shows that for any sufficiently regular . As a consequence of this, one could try to use the ansatz and transform the equationto the fixed point equationOne can hope to solve this equation by standard perturbative techniques, such as the inverse function theorem or the contraction mapping theorem, hopefully exploiting the smallness of to obtain the required contraction. Unfortunately we run into a fundamental *loss of derivatives problem* , in that the quadratic differential operator loses a degree of regularity, and this loss is not recovered by the operator (which has no smoothing properties).

We know of two ways around this difficulty. The original argument of Nash used what is now known as the Nash-Moser iteration scheme to overcome the loss of derivatives by replacing the simple iterative scheme used in the contraction mapping theorem with a much more rapidly convergent scheme that generalises Newton’s method ; see this previous blog post for a similar idea. The other way out, due to Gunther, is to observe that can be factored aswhere is a *zeroth order* quadratic operator , so thatcan be written instead as

and using the right-inverse , it now suffices to solve the equation(compare with), which can be done perturbatively if is indeed zeroth order (e.g. if it is bounded on Hölder spaces such as ).

It remains to achieve the desired factoring. We can bilinearise as , where

The basic point is that when is much higher frequency than , thenwhich can be approximated by applied to some quantity relating to the vector field ; similarly if is much higher frequency than . One can formalise these notions of “much higher frequency” using the machinery of paraproducts , but one can proceed in a slightly more elementary fashion by using the Laplacian operator and its (modified) inverse operator (which is easily defined on the torus using the Fourier transform, and has good smoothing properties) as a substitute for the paraproduct calculus. We begin by writing

The dangerous term here is . Using the product rule and symmetry, we can write

The second term will be “lower order” in that it only involves second derivatives of , rather than third derivatives. As for the higher order term , the main contribution will come from the terms where is higher frequency than (since the Laplacian accentuates high frequencies and dampens low frequencies, as can be seen by inspecting the Fourier symbol of the Laplacian). As such, we can profitably use the approximationhere. Indeed, from the product rule we have

Putting all this together, we obtain the decomposition

where

and

If we then use Cramer’s rule to create smooth functions dual to the in the sense that

then we obtain the desired factorisationwith

Note that is the smoothing operator applied to quadratic expressions of up to two derivatives of . As such, one can show using elliptic ( Schauder ) estimates to show that is Lipschitz continuous in the Holder spaces for (with the Lipschitz constant being small when has small norm); this together with the contraction mapping theorem in the Banach space is already enough to solve the equationin this space if is small enough. This is not quite enough because we also need to be smooth; but it is possible (using Schauder estimates and product Hölder estimates) to establish bounds of the form

for any (with implied constants depending on but independent of ), which can be used (for small enough) to show that the solution constructed by the contraction mapping principle lies in for any (by showing that the iterates used in the construction remain bounded in these norms), and is thus smooth.