# Easy Scalable Text Rendering on the GPU

This post describes the text rendering algorithm I used on thetamath.com , a GPU-powered 2D equation renderer I wrote for fun last December. The description assumes a minimum familiarity with GPUs and vector graphics.

The technique is completely resolution-independent and very fast. The rendered outlines are exact and are calculated completely on the GPU using very simple steps. It uses much less memory than other common techniques such as signed distance fields because it doesn’t need a texture cache of pre-rendered glyphs. It also supports state-of-the-art subpixel anti-aliasing for extra-crisp text on LCD screens.

### Filling in a Glyph

I’ll be using letter “e” in the italic variant of the Free Serif font as a case study:

Our first goal is to fill in all of the pixels inside the glyph using the GPU. The outline of a glyph is formed by a series of straight and curved segments. TrueType fonts use quadratic Bézier curve segments . Each quadratic segment has one control point that controls how the curve bends. The control points are drawn below in gray.

For simplicity, let’s pretend there aren’t any curves for the moment. Our letter “e” is now just a big polygon with a polygonal hole in it.

GPUs can’t draw polygons directly; all they can do is draw triangles. However, we can use a trick to fill all pixels inside polygon using only triangles. To start, pick a point somewhere in space and then generate a triangle through that point for every line segment in the polygon. That looks like this when the triangles are drawn together on top of each other:

We now need to change how pixels are filled to avoid also filling pixels outside the shape. Instead of darkening all pixels inside each triangle, we need to have each triangle flip the pixels under it.

This simple operation is actually computing the “ winding number ” or the number of times the outline winds around a point. A given pixel is only filled in if the outline winds around that pixel an odd number of times. Since addition is associative, the winding number for the entire polygon is equivalent to the sum of the winding numbers for each triangle.

Perhaps the most straightforward way to flip pixels on the GPU is to use the “invert” feature of the stencil buffer, which is available in both OpenGL and DirectX. However, there’s a hack that avoids needing a stencil buffer at all. Just draw with additive blending and use a color value of 1/255 (since colors on the GPU are 8-bit and range from 0 to 1). The resulting pixel only lies inside the outline if the accumulated color scaled back up by 255 is odd.

Computing the winding number using a color buffer instead of a stencil buffer has the drawback that it only supports shapes that overlap fewer than 256 times, but that shouldn’t matter for simple shapes like glyphs. Using a color buffer has the additional benefit of being able to pack more samples per pixel as we’ll see later when we get to anti-aliasing.

Now that we’ve gotten this far, adding the curves back is easy. We just need to flip the pixels in the area between the actual curve and the edge of the polygon. These little areas are “corrections” that fix the polygonal approximation we made earlier. Drawing these extra areas on top of the original polygonal ones looks like this:

The area between the actual curve and the edge of the polygon is easy to calculate because it can be done independently for each quadratic segment. A quadratic segment has two endpoints and one control point. This means each quadratic segment can be drawn with a single triangle:

Given a point P = P0·s + P1·t + P2·(1-s-t) in the triangle (P0, P1, P2), the pixel in the triangle is only flipped if (s/2 + t)² < t. Otherwise the pixel is left alone. This formula is from a Microsoft Research paper by Charles Loop and Jim Blinn and can be derived from the definition of a quadratic segment. The “s” and “t” values are known as Barycentric coordinates and are a common way of defining a coordinate system on a triangle. Drawing everything with pixel flipping now fills in the correct outline:

This technique is really powerful because it is resolution independent and has a very low memory overhead. It’s also simple to implement and completely parallel.

### Anti-aliasing

We now have a way of filling the pixels inside a glyph but unfortunately it currently looks really bad for small font sizes. The problem is that every pixel is either completely filled in or completely empty:

Most anti-aliasing techniques do boundary smoothing by treating each pixel as a little square and visualizing the fraction of the pixel area that is contained inside the outline. Since there’s no equation for the exact value of the fraction, it’s usually approximated by testing a few different points inside the pixel square. The fraction is the number of points inside the outline over the total number of points tested:

Our pixellated example image from before looks like this with anti-aliasing:

Multiple samples per pixel can be done on the GPU using an extension of the color buffer accumulation approach mentioned above. We can easily get four samples per pixel by rendering the same set of triangles four times, each time with a slightly different offset and using a color with a different channel each time. For example, using a color of (1/255, 0, 0, 0) for the first sample will only accumulate into the red channel.

If the triangles in our glyph don’t overlap more than 15 times then we can do even better. Rendering with a color of (16/255, 0, 0, 0) lets us pack an additional sample into the same red channel we used earlier. Doing that four times brings the total to eight samples per pixel with a single color buffer. This works because the first set of four samples only needs the lower 4 bits (since the winding number will never exceed 15), leaving the upper 4 bits free for the second set of four samples.

The number of samples per pixel is a direct tradeoff between rendering time and smoothing quality, but there’s one more trick we can use: subpixel anti-aliasing . This trick relies on the fact that most LCDs are constructed such that each pixel square has three separate R, G, and B columns. The anti-aliased image above actually looks something like this up close on an actual display:

If we treat each R, G, and B column as a separate area with its own fraction, the apparent horizontal resolution triples. This means lines and curves that are close to vertical become much smoother:

Visualizing this as a grayscale image instead looks like this:

And in color it looks like this:

In practice this needs one further adjustment. The human visual system picks up on the color fringes if we do it this way. We need to blur the triple horizontal resolution grayscale image using an averaging filter that is a single pixel wide (so each subpixel becomes the average of itself and its two neighbor subpixels, one on either side):

This blur balances the horizontal and vertical edges again by making them the same thickness. The final color image looks like this:

Even though it’s possible to use eight samples per pixel, I ended up using just six samples per pixel. It divides nicely between the three subpixels and still gives very good curve smoothing. I distributed the six sample locations carefully within the pixel square so there is one on each of the six horizontal and vertical subdivisions:

This distribution gives an effective 6x resolution in both x and y for boundary edges that are almost horizontal or almost vertical. It’s more beneficial than arranging all six sample locations in a hexagon, for example.

The code is open source on GitHub in case it’s useful as a reference. The code itself probably isn’t directly usable in other apps since this project was an experiment in building a highly-optimized app (it’s written in the Skew programming language for one thing). But the algorithm itself is very simple and should be able to work well on any GPU.