神刀安全网

Job System 2.0: Lock-Free Work Stealing – Part 5: Dependencies

In today’s post, we will finally take a look at the last remaining piece of the new job system: adding dependencies between jobs.

Other posts in the series

Part 1: Describes the basics of the new job system, and summarizes work-stealing.

Part 2: Goes into detail about the thread-local allocation mechanism.

Part 3: Discusses the lock-free implementation of the work-stealing queue.

Part 4: Describes high-level algorithms.

Keeping it simple

For the new job system, I really wanted to keep the complexity of the implementation down compared to the previous approach in version 1.0. Sure, the lock-free work-stealing queue is quite complex in its implementation, but that piece of code is self-contained, keeping the rest of the code relatively easy to understand.

Introducing dependencies to the job system in the past always meant that a second queue had to be introduced (for storing jobs that cannot be executed yet), existing functions had to be re-evaluated, and some of the code had to be re-written in order to take dependencies into account.

This time around I wanted to do the simplest thing possible: letting the user state dependencies explicitly by means of continuations .

Instead of saying “whenever this job finishes, try to run any of the other jobs that were maybe waiting on it”, we now explicitly say “as soon as this job finishes, run all its continuations immediately.”

Running jobs immediately in this case means pushing them into the job queue, so that the load-balancing can take over and make sure that all cores/threads are equally getting work done.

In code, this means that we now do the following:

AddContinuation(ancestor, dependency1); AddContinuation(ancestor, dependency2);

The simplest solution to implementing this would be to store all the continuations directly as data in our Job struct. Which is exactly what we’re going to do!

Supporting continuations

Bumping the Job struct from 64 to 128 bytes in size allows us to store the following:

struct Job {   JobFunction function;   Job* parent;   int32_t unfinishedJobs;   char data[52];   int32_t continuationCount; // new   Job* continuations[15];    // new };

In production code, we can steal a few bits here and there because we don’t need to store pointers but can store offsets instead. And even with offsets, we probably don’t need a full 16-bit offset.

This essentially leaves us with support for 52 bytes of data and 16 continuations per job. Or more space for data, but fewer continuations. We could even merge the data and continuations array, and store data “dynamically” with a few runtime asserts thrown into the mix.

Adding a continuation to an existing job is straightforward:

void AddContinuation(Job* ancestor, Job* continuation) {   const int32_t count = atomic::Increment(&ancestor->continuationCount);   ancestor->continuations[count - 1] = continuation; }

Of course we would need to make sure that the ancestor job hasn’t been Run() yet, and that there is still space in the continuations array. Note that incrementing the number of continuations has to be done atomically to ensure that we don’t introduce data races in case other threads try to add other continuations.

Adapting the job system to support continuations is simple as well, we just have to slightly alter the Finish() function:

void Finish(Job* job) {   [...]   if (job->parent)   {     Finish(job->parent);   }    // run follow-up jobs   for (int32_t i=0; i < job->continuationCount; ++i)   {     PushToQueue(job->continuations[i]);   }   [...] }

And that’s pretty much it, staying true to the KISS mantra. The new implementation has a few advantages compared to the old approach:

  • Simple implementation.
  • Plays nicely with parent-child relationships, everything works out of the box.
  • No more (possibly) unbounded stack space needed, all memory allocated up-front.

Of course, each Job now needs twice the amount of memory, but in times where even mobiles offer 512MB memory, I don’t see that as a big problem. Furthermore, for architectures that use 128-byte cache lines we would have wanted to bump the size anyway in order to avoid false sharing.

Conclusion

Finally, this part concludes the Job System 2.0 series. I hope you liked it!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Job System 2.0: Lock-Free Work Stealing – Part 5: Dependencies

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮