神刀安全网

A cache miss is not a cache miss

When writing performant code, we are careful to avoid cache misses when possible. When discussing cache misses, however, we are usually content with counting the number of cache misses. In this post I will explain why this is not sufficient, as data dependency also make a huge difference.

Consider for example a vector<unique_ptr<int>> . If the vector is large, iterating over the integers will typically incur a cache miss per element, because they are stored at arbitrary locations in memory. If we have a list<int> , iteration will also incur a cache miss for each element, because the nodes of the list are stored at arbitrary locations.

This could lead us to believe that cache misses have the same effect on performance for both of the above cases. That assumption is what I wanted to put to the test in this post.

Setup

To avoid the peculiarities of the memory system, or making my own allocator, I have simplified the linked list structure used for the experiment. The data type I use for iteration is a simple vector<pair<int, int>> . Each element consists of an index and a payload. The indices point to the next element in a linked list of all the elements in the vector . The data is generated by this code:

vector<pair<int, int>> create_random_data(int N) {   mt19937 mt;    vector<int> sequence;   {     int i = 0;     generate_n(back_inserter(sequence), N, [&] { return i++; });     shuffle(sequence.begin(), sequence.end(), mt);   }       vector<pair<int, int>> ret;   generate_n(back_inserter(ret), N, [&]{ return pair<int, int>{ 0, mt() }; });    for (int i = 0; i < N; ++i) {     auto next_i = (i + 1) % N;     ret[sequence[i]].first = sequence[next_i];   }    return ret; }

Iterating over the linked list is done by the following code (the write to volatile is done to guard aginst optimizing away the loop):

volatile int volatile_sum;  void iterate_linked_list(const vector<pair<int, int>>& data) {   int sum = 0;   int idx = 0;   do {     sum += data[idx].second;     idx = data[idx].first;   } while (idx != 0);    volatile_sum = sum; }

Iterating over a vector of pointers is approximated by simple reading the payload of the next element in the chain, but otherwise going through the vector in normal order:

void iterate_pointers(const vector<pair<int, int>>& data) {   int sum = 0;   for (const auto& d : data) {     sum += data[d.first].second;   }    volatile_sum = sum; }

And finally, we also compare with a simple iteration over the vector, where we expect almost zero cache misses:

void iterate_simple(const vector<pair<int, int>>& data) {   int sum = 0;   for (const auto& d : data) {     sum += d.second;   }    volatile_sum = sum; }

For each

, the operations are run 100 times and the median time is reported. The code is compiled with MSVC 2015 Update 2, with optimization level /O2

. Running on an Intel i7-4600U @2.1 GHz.

Results

We can see clear jumps in the run times when the data set grows out of the various cache levels. Furthermore, we see that the effect of cache misses is much worse for the linked list. I have also verified with gcc and perf that the cache miss count is roughly equal for the pointer iteration and linked list iteration.

Why do we see this large difference? The answer is simply data dependency. When you are iterating over a range of pointers, the CPU can start reading in several values a the same time, and then add them to the sum as they come in from the memory system. With the linked list structure, we can’t continue to the next node until the current node is fully read from memory, because we don’t know where to go next. This means we can’t have many reads in flight at the same time. Put simply, the pointer iteration is more dependent on memory throughput, the linked list iteration is more dependent on memory latency.

Conclusion

I hope I have made the argument that all cache misses are not equal, and that considering data dependency between the reads can be just as important as counting cache misses.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » A cache miss is not a cache miss

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址