In theprevious post we delved into c++ fold expressions; despite them being available only for a set of built-in operators we devised a way to use them with arbitrary callables, that allows us to write things like a compile time reduction of sequence’s maximum:
In this post we are going to expand on arbitrary fold expressions by incorporating proper move semantics and implementing generator-like lazy evaluation. Along the way we’ll attempt our take on expression templates, a technique I call flat expression templates and examine some peculiarities of the
2. Becoming move aware
Recall that in our previous post we used a “vessel” type to serve as node type in our expression tree ? Well, obviously we don’t have to make a copy of our (expression) data every time we build such a node, so why not have two versions for it, one for lvalue data, which will hold a reference to the actual data, and one for rvalue data that will “move absorb” the data into a member:
O_x is winking the right eye which according to the Taranki Herald guide to eye flirtation means “I love you”. We’ll be respecting the order of evaluation and need to distinguish between left and right folds so node types should be discrete for the operators to be able to correctly build the expression tree. I don’t normally use emoticons to convey messages in code, but this felt … right.
give() member functions return references to the underlying data and
clone() is used to overcome some nuances in type deduction later on, ensuring that we’ll either get a reference or a copy of the data if it was an lvalue or an rvalue respectively (useful when using
O_x temporaries to avoid references to expiring objects). It goes without saying that we have to sprinkle
std::forward(...) all over the place, eg:
A forward macro like
fw is a good way to be bit more terse and write more readable code. I improved mine by adding
__VA_ARGS__ as suggested by Vittorio Romeo . The complete code will be shown in compilable examples (and made available in more “official releases” later on) but for now let me stress three points:
- Forwarding references need to be used (evaluated, passed to another function, called) with
std::forwardto preserve their value category.
std::moveis like playing catch, it doesn’t work if there’s nobody on the other side. No matter how “move aware” is a library, it’ll end up copying stuff when handling non-moveable types.
- Moving from an object places it in a valid but unspecified state so be really thorough and explicit; even when creating dangling references or invalid state things may appear to work and that’s the worst kind of bug, the kind that bites when you’ve moved on.
3. Lazy evaluation
To respect their “functional heritage”, fold expressions in c++, should make a plea for lazy evaluation. Laziness is a common feature among functional programming languages, whereby expressions are evaluated only when necessary rather than upon declaration. This affects our programming style and the way we think of programs, infinity and efficiency. Below we’ll outline laziness in two programming languages and describe what’s doable in our case
The Haskell case
Haskell implements lazy evaluation down to its expression evaluation machinery. This means that the abstract syntax tree we get for a valid program is “evaluated” according to lazy evaluation rules, which most commonly manifest themselves in the following three ways:
- arguments to functions are evaluated only when this is necessary for evaluation to continue.
- an argument is not necessarily evaluated fully: only the parts that are needed are
- an argument is evaluated at most only once. This is done in the implementation by
replacing expressions by graphs and calculating over them.
The Python case
One could argue that since Python is an interpreted language and solely code paths that are executed need be checked in any way (as opposed to C++ not compiling for error types in unevaluated contexts), you have lazy evaluation right there. But this is not the case; Python’s lazy evaluation is not built into expression evaluation, take the following example :
Here the interpreter calls
f1 three times, even though the second argument to
f1 is not evaluated. Haskell wouldn’t do that (it would only evaluate
f1 twice); we won’t get into details, but this case falls under the first rule mentioned above.
So how can Python handle infinite lists, have lazy evaluation and perform calculation on demand ? The answer lies in the use of 1 (and a half) pattern. We avoid calling them 2 patterns because of the amount of conceptual overlapping. It all starts with the iterator pattern and implementing the iterator protocol :
Implementing the “magic methods”
__next__ in Python3) as shown above makes the class usable in context where we loop through the items it “produces”. Boilerplate code can be reduced using the generator pattern like so:
This pretty cool piece of code can provide “computation on demand” and has endless applications, from mundane ones to mind boggling. Check the references for a thorough introduction.
The curious case of C++
C++ implements a special case of lazy evaluation in short circuiting, where e.g. the second argument to the
|| operator won’t be evaluated if the first is
true : the expression will yield
true right away (note that both sides must compile). The other place where we find lazy evaluation in c++ is template (meta)programming : only instantiated templates are evaluated (OK plus the syntax of unevaluated templates is checked) so in a way “if you don’t use it, it doesn’t exist”. To implement lazy evaluation for our fold expressions we’ll be inspired by the iterator pattern, though this would be a great opportunity to apply the generator pattern via coroutines (but let’s wait for things to become a bit more standard before using yield-like constructs in C++)
4. Flat expression templates
Expression templates as used in the previous post are not ideal for lazy evaluating the expressions we build since the content is embedded in the form of the data structure. In order to lazily evaluate we need to be able to step through intermediate states easily and express clearly the resulting types and the advancement of our computation. I haven’t quite seen this as we’ll be presenting it here, hence my take at giving it a name : flat expression templates . That been said I have to mention that it’s a common technique in GPU programming to “flat out” data structures in order to make them more “stream-able”, so this is not a “virgin birth”. So for a, right fold, expression say:
(a + (b + (c + d)))
what we’ll do is convert this:
We need a type to hold the expression in a reverse polish notation fashion and we choose to store the “nodes” in a tuple to be as “compile time friendly” as possible (we’ll see later on that
constexpr evaluation is still on the table).
O_Om holds the expression (in its right hand) and has both its eyes open (we’ll be using it for left folds as well). To highlight some of the code we note that:
stateis the accumulator and has the type of applying the callable on the expression. If no accumulator is explicitly provided the rightmost argument becomes it. As in Haskell the accumulator consumes the expression from the right to left (so write the callable accordingly).
- A nested iterator type will keep track of the nodes that have been computed
yieldreturns the result of the expression right away. It can be evaluated at compile time and it’s a good example of a
constexprmember function not being
const(it alters the
vtu::call_with_tuple_element_firstis a mechanism to visit a tuple with a “runtime” index (quoted because inside
iis evaluated at compile time)
5. Finally, lazy folds
The last piece of the puzzle is the expression iterator, which will be a nested member of
All this iterator needs to do is hold references to the flattened expression and the state and keep track of the current position in the expression. Calling
++ on it evaluates the position on the expression and dereferencing it returns the current state. We only implement
< and take advantage of
std::rel_ops to generate the rest. Having this in place we can write code like this:
JoinAc is a functor that concatenates the stringification of its arguments and we can loop through the expression state with range based for loops since our
end member functions (a traditional
while would also be acceptable). We also showcase that compile time evaluation is still doable by creating a
mm using a “right away” evaluation.
Check a compilable example here