神刀安全网

The most seductive keyword in Go

April 19, 2016

Of the few keywords in Go, the most seductive is go itself. It promises simple access to both concurrency and parallelism, even if they are not the same thing .

If you can get through your first Go program without even trying out the go keyword, you have more self-control than I do. You may also be clinically dead. If we can give ourselves something great for cheap or free, why not go for it? Even media billionaires and democratic socialists can agree on this.

But seductive tools (and seductive people ) come with a cost. The costs of distributed computing are many . We know for certain there will be errors. This is why errors are values in Go rather than exceptions . We want the full power of Go and the full context of our executing program available to us when we encounter an error. We don’t always want to fail in the same way , just like not all people can be seduced in the same way , no matter how much language designers and pickup artists would like it to be true. Instead, we should handle errors as they come, and learn to trust our instincts .

A non-trivial example

The best way to start is with a simple example, but one that is non-trivial and stresses concurrency, parallelism, and error handling all at once. Let’s say we want to use the MediaWiki API to download images recently uploaded to Wikimedia Commons . Once we download the images, we want to decode them and pick out the first (non-gray) color we encounter. We’ll map the color to an xterm256 palette , and print that color to our terminal. Here we have the three key factors:

  • Concurrency – waiting for the images to download
  • Parallelism – decoding the images and reading each pixel
  • Errors – we don’t support all image types and there may be other kinds of failures

The naive approach

If we use this wikimg package to download and decode the images, our first implementation might look like this:

Nothing happens. I am pretty sure this is exactly what my first go program did. So what’s wrong? The main goroutine has nothing left to do. Everything is in the background. So the process exits, and we never get results.

However, by increasing the number of images we request in our -max command line argument, we can delay the main goroutine from exiting so quickly. Let’s be hasty and request 100,000 images:

Hey, that sort of worked. We printed some colors to the terminal. And the image: unknown format errors are expected. We only support image/gif , image/jpeg , and image/png . But later on, we begin to get TCP errors. That’s because we’re creating a new goroutine and a new concurrent network connection with each of our 100,000 image requests. This is not sustainable on any system.

Communication over channels

To improve our program, we can use a fixed number of goroutines and communicate via channels. A rewrite looks like this:

Why is the buffer size 10,000? No particular reason. We just want it to be big enough to not cause blocking. When starting out, err on the side of making your channel buffers too big, as the memory usage is likely insignificant in most programs.

Why do we use struct{} in the done channel? Though a bool might be more logical (it is “true” we are “done” with this goroutine), the empty struct consumes zero bytes . This is a micro-optimization to be sure, but you will see this style elsewhere.

Bidirectional communication

Up to now, we’ve been logging errors when they occur within the worker. What if we wanted to count the total number of errors vs. successful calls? Instead of a “dumb” empty struct channel, we can communicate an error value back to the main goroutine. The error value is effectively the “response” (nil or non-nil) of processing this “request” (the URL). We’ll also formalize our concept of a “worker” by making it an independent function rather than a closure.


Timeouts and context

Now, let’s make a few dramatic steps to ensure we can survive in the real world.

  • Instead of a terminal program, let’s create a webserver (fortunately, our library also returns hex color codes)
  • Let’s formalize the concept of “request” and “response” into types
  • Most importantly, let’s implement a timeout for long requests

The trickiest part of this version is just two lines. Lines 74 and 78. What is context ? Although the experts explain it in more detail, the useful part of context for us is the Done() method.

By creating a WithTimeout context, we get a channel that will be closed after our timeout expires. This by itself does nothing. But our wikimg.Puller type exposes a Cancel field that can be set to a channel:

// Cancel is an optional channel. Setting this value on Puller // and closing the channel signals to the Puller that any // in process operations (i.e, retrieving an image or computing // its first color) should be canceled. Any future // calls to Next() or FirstColor() will return a Canceled // error. Cancel <-chan struct{} 

We use this in three places inside the wikimg library. First, we set the Cancel field of the http.Request . Like this:

Second, while iterating through every x,y coordinate of the image, we periodically check if the channel has been closed :

Finally, we also do a similar check in Next()

Why are we going through all this trouble? Because if we let the client timeout without cleaning up, we’ll just be expending resources for a result that nobody is waiting for.

Ok. Let’s run it. We have two windows, one is the server side error log. The second is the web client.


Background goroutines, caching, and serialization

That was pretty cool, but our program is still slow, because it’s always pulling new images for every client from scratch. Instead, we should:

  • Have a cache
  • Run a background process that prefills the cache
  • Allow our clients to hit only the cache, never the actual Wikimedia API

If we’re building a cache that will be accessed concurrently, we know we can’t use Go’s core data types as-is, because they are not thread-safe . From the Book of Proverbs , let us pray: Channels orchestrate; mutexes serialize .

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » The most seductive keyword in Go

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址