My name is Peter Alvaro. I am currently an assistant professor at UC Santa Cruz. I have been there a little bit less than a year. Before that, for about 7 ½ years, I was a graduate student at UC Berkeley where I studied with Joseph Hellerstein in the database research group. And before that, for about roughly the same amount of time, for 8 years, I was a software engineer at an internet search company called Ask Jeeves. Before that, I did my bachelors degree in English Literature and Philosophy. It has been a long road for me.
It turned out to be quite natural because after I graduated from college and I traveled a little bit, I ended up in San Francisco in 1999 and the tech jobs were just falling off the trees and so I was able to learn on the job, to recover some of the intuition about computation that I had gotten from my philosophy background and from being a hacker on Pascal and Basic when I was a boy. It all kind of followed in a very natural way. It turned out to be quite a good career.
This is a subject that I discuss a lot because currently, at Santa Cruz, I am teaching distributed systems, both at the graduate and the undergraduate level. The way that I like to present what is so frustratingly hard about distributed system – it seems a little bit paradoxical, right? We have been studying distributed systems for over 30 years now and yet only in the 5 or 10 years have people started pulling out their hair and complaining about the difficulty. I think that there are two things at work here: distributed systems are fundamentally hard, as we have always known because of the presence and interaction of two different forms of uncertainty. Programmers hate uncertainty.
Programmers want to be able to give a computer an instruction and know that it will do it, to be able to predict the space of programs and of executions implied by their program. In distributed systems, there are two independent but tightly correlated sources of uncertainty: asynchrony, which is uncertainty about the ordering and the timing at which messages will be delivered to different nodes. If two different nodes that are supposed to, for example, be replicas of the same storage state, receive messages in different orders, there is uncertainty about whether they will converge. Just to make the matter more complicated, distributed systems also suffer partial failure, which is a failure mode that does not occur, or if it does occur is masked by the lowest levels of the system in single-site systems. Partial failure means that some of your compute components may fail to run, while others keep running and your program nevertheless gives an outcome, which may be incomplete or incorrect.
These two failure modes are difficult to manage, but possible to manage on their own through a variety of techniques that we probably do not have the time to talk about right now. But because they are both present together in a distributed system, it is notoriously hard and it is my conjecture that it is exactly the existence of these interacting failure modes that leads to a lot of these notorious impossibility results in distributed systems. Most of the theory that we have in distributed systems tells us about what we cannot achieve. So I am very much in favor of doing work that gives us more constructive results, that tells us about what we can achieve.
CAP theorem is a very well known one. I mean an arguably a fairly obvious one because the CAP theorem just sort of says you cannot have your cake and eat it, too. If you have two nodes that are very far apart, they can either be up to date or wait. So, that is a fairly obvious one. Maybe a more fundamental impossibility result is the so called FLP impossibility result which establishes that any deterministic protocol that can solve consensus, for example protocols like Paxos and Raft and things like this, has executions in an asynchronous environment that do not terminate. So instead of saying something like CAP like if you want strong consistency, you cannot be available, it makes it much more concrete. If you want to be strongly consistent, you may not terminate. Another very famous impossibility result that I think is a very fun one from a pedagogical point of view is the two generals result which establishes that in an environment with uncertain communication, we can never reach agreement. Are you familiar with the two generals? Shall I say it?
Werner: Let us explain it for our audience. I obviously know the two generals result, but let’s discuss it for our audience…
Absolutely. So there are these two generals and they are encamped on two different hills. In a valley between the hills there are enemies encamped. Now, the two generals collectively outnumber their enemy. So if they were to attack simultaneously, they would surely win the battle, but if they were to attack one at a time, they would certainly lose. Now the only means of communication that these generals have is to send a mounted messenger through the valley. Obviously that mounted messenger could be ambushed. So the question at hand is: can these two generals devise a protocol, that is to say a message passing protocol, that enables them to decide to simultaneously attack. And the answer is “no” and the proof sketch why this is true, which has very much to do with common knowledge, if the first general sends a message that says “Attack at dawn!” – does the first general attack at dawn? He can’t because he cannot distinguish between executions in which the messenger made it, in which case they would attack together and executions in which the messenger was ambushed.
So this general, being a smart general, he would certainly send his messenger with a message that says “When you get your answer, send back an acknowledgment messenger”. Now when the other general sends back the acknowledgment messenger, does he attack? No. Because he cannot tell the difference between execution – you see where this is going. We can construct an infinitely deep protocol, but we will never attack. So in the face of uncertainty, certain otherwise easy things become actually impossible. For practitioners, these impossibility results do not very often arise in practice because it is not the case that all messages will be lost in practice. It is not the case that networks are truly asynchronous. So these impossibility results in practice often manifest in terms of costs that programmers do not want to incur. The practical take away from this is that agreement protocols will very often cause penalties in latency, in throughput, in availability that programmers would like to avoid. In fact, a great deal of my early research involved statistic analysis for languages that can give the programmer guidance about exactly when they can avoid costly synchronization mechanisms and still obtain correct results.
This gets a little bit into somewhat subjective philosophy, but it seems to me that traditional languages, regardless of their paradigms, because nowadays people are programming in pure functional paradigms, hybrid functional paradigms, imperative paradigms, data centric paradigms. Regardless of the sort of paradigm on the ground, they are all based on a fundamental model of computation, this von Neumann model, which is sort of order-centric all the way down. In the van Neumann model we say we have a CPU – one of them – with a clock – one of them, we have an ordered memory. Computation looks like iterating in order over memory. So there is sort of order everywhere you look. If you tried to do an analysis that said is really what these synchronization analysis I described are trying to do is trying to say “When are computations, that we might like to be distributed, insensitive to order?”.
When can we tolerate asynchrony, right? If you tried to do that in a language like Java, it will be very frustrating because order is sort of everywhere you look. Every assignment statement to a first approximation requires a certain barrier. So a lot of the work that I did in my early time at Berkeley and that I have put aside for now as I am sure you know – it is very hard to sell a language. But when I had the freedom as a grad student to try to invent and sell languages, the idea, the paradigm that I advocated for is what I called disorderly programming. So to come up with languages that more closely resembled query languages like SQL and Datalog, languages for which it is well known that programs can be executed in batch, over arbitrary orders, over partitioned data and nevertheless achieve the same results because most of the operators in query languages, operators like select, project and join, are commutative and associative.
So if your operations are commutative and associative, obviously you are order-insensitive. So I advocated this idea, what I called “disorderly languages”, languages that processed over sets and tried to ignore order, in order to achieve two things: the first thing was to encourage programmers to specify their computations in and order-independent way, even if the natural mode for them would have been to do it ordered. We are taught to program and say “do x, assign it to a variable, and then do this, and then do that” when you could just as easily have removed the “then”, if you had stepped back for a minute. You would have said “do this, and this, and this”.
It is very often the case that what you want is atomicity, you want a bunch of things to happen together, or you want mutual exclusion, meaning you want something to happen and something else, but not both. Or you want sequencing – you want this to happen and then this. But you sort of achieve all of them through sequencing because that is the fundamental primitive given to you by the languages that you have. So that is the idea of these languages – you do not take sequencing away, but you hide it in a corner of the system so that the programmer has to explicitly ask for ordering. In such a way, the idea was that since ordering is the hard and expensive thing in distributed systems, you should make the programmer think when they ask for it so you do not end up with spurious order, which could be expensive to enforce for the reasons that I just enumerated: you need consensus, you need locks, you need these mechanisms that have costs.
Werner: So basically commutativity is a programmer’s best friend. Yes, commutativity, associativity, idempotence – all these algebraic properties. So if you can write down your systems in a language in which it is easy to spot those properties, you can win big.
No, I think functional languages help a lot, but do functional languages really give you a sound, static analysis for which operations commute. I think not necessarily. I think being able to reason about in a referential integrative way about the absence of side effects is profoundly helpful, but I do not think it gets all the way there. Personally, I think – and again this is maybe a subjective matter – I strongly believe that they are sort of isomorphic, but given a choice between a functional paradigm and a fundamentally relational or declarative, data-centric paradigm, I always incline towards the latter. This may just be because I come from the database tribe originally. But, you know, queries over sets, it takes sets in and takes sets out, ignoring the ordering and the batching in which things happen to appear, which is an attribute that was convenient in databases because of the efficiency of batch computation, but it is interesting applied to distributed systems where there is this high degree of uncertainty about arrival order.
Yes and no. I mean, Prolog is a very elegant language, but unlike Datalog, it does not have clean model-theoretic semantics in large part because in order to make Prolog certain to terminate and have efficient behavior, you end up using these extra logical operators like Cut. Once you introduce the Cut, the programmer is giving guidance to the runtime about execution order, which is exactly what we are trying to avoid.
Werner: I see. So basically it is a nice model in theory, but in reality you have to reintroduce sequencing. That is right. Essentially, in an ideal world – and this is a database guy talking obviously – a programmer would be an expert in a domain, like a real domain, like medicine, right. They would be able to write down the business rules about their domain, describing the desired outcome of the computation, without having to commit to details because the assumption is – and this is certainly true in databases – that the details are likely to be highly dynamic, changing all the time, changing as you upgrade your hardware, as you scale out your cloud and so on and so forth. But the less you rely on the programmer for hints about placement – well, placement is a contentious one – but about execution order, the more flexibility you have to rely on an optimizer to make the right decisions to be consistent with the desired semantics of the program and to be able to fit the execution environment. Now, I do not mean to trivialize how difficult distributed optimization is, but it is still a worthy goal for a researcher.
I did and I regret even mentioning it now because there is a sort of an eternal debate about whether programmers should reason about locality explicitly or whether that should be managed by the underlying system. In particular, should systems give programmers the notion of regions so that they could try to place computation close to data. A lot of Google’s research has advocated for this. Purists might argue that the collocation or rendezvous of data and communication might be better decided by an optimizer, than by a human. I am very much on a fence about this. I do not know what the right answer is.
I do not know. I think that is a really hard one. As somebody with a database background, I am inclined towards declarative programming where you under-specify details of execution in favor of carefully specifying constraints on the output. On the other hand, when we look at the history of distributed systems programming, attempts to hide details like locality have resulted in disasters. So you think about things like RPC. We do RPC and it is great. It is like a function! Only there’s all these failure modes that happen when the function is being executed on another machine that are not exposed by the interface of the function. NFS – complete disaster, right? Historically, attempts to hide distribution completely and to provide a single system image have not worked. So I have a strong gut feeling that we should not do that. But this is balanced by a strong gut feeling that we should program as declaratively as possible. So, it is a trade off.
I think this is a really great question. I would say on the one hand that when we think about the problem very carefully, it is easy to convince ourselves and I certainly think that is true, that correctness is a very application-specific notion. Every application has a different notion of what it means to have correct executions. So there is something somewhat flawed in looking for a gold standard that is application independent, like serializability or linearizability or something like this. I am a huge fan of my colleague Peter Bailis’ work where he always says given a set of invariants, what operations are safe – commutative, associative, idempotent, etcetera – with respect to those operations.
I think that is a very important research frontier. But I will point out that if I had to choose a gold standard for distributed systems that I could apply independently over a particular application problem, it certainly would be determinism and so the Bloom language and the Dedalus language which I developed at Berkeley, the main feature of those languages was a static analysis. So you could write a program, the program is a Turing-complete language, you could do whatever you wanted, but the static analysis was sort of a filter that would say whether or not your program had a property that, despite non-determinism in the scheduler and essentially a non-determinism environment – non-determinism in the scheduler, non-determinism in the message ordering – whether or not it would produce deterministic results. So you pass your program through that filter and if you get the thumbs up, it tells the programmer “You can pretend you are living in that simple world again” That is not to say that all problems can be solved in that way.
The other really nice thing about Bloom was that if your program was not in that class of good programs – this was based on something called monotonicity analysis and I do not think we will have time to talk about in this talk, but certainly you will have to read my papers. So if your program was not in that good class it would identify individual statements or rules in the program that could require ordering of their inputs to ensure the deterministic outcomes. And then some of my later work in my thesis, a project called Blazes, presented a framework which, when it recognized program locations that could require order, then it was able to chose from a variety of order imposing constructs including barriers and other synchronization methods and actually place those constructs in your program and re-execute it, guaranteeing orders. And it would try to do this in a way that minimized coordination and wait times.
Werner: So you basically fix your program. That is right. At a really high level, although I did not actually use locks. A way to think about it was: you write your program without locks, this thing does static analysis and it puts in the right locks, you know?
This is going to get a little bit esoteric, but in some sense, yes, because any CRDT, internally, if it is a so-called state-based CRDT and all operations based CRDTs have an equivalent state-based CRDT – it has a property that its state evolves as a lattice. That is to say that any two divergent states of replicas have a least upper-bound that is a reasonable merge point of those two states and that is exactly a monotonicity property. And so the initial Bloom language was all set-based. It was relational. And we observed that we could sort of white-list most of the relational operators. So Select is monotonic. So what does monotonic mean in terms of relations? Well, the more you know, the more you know, which means that as you accumulate knowledge over time, you have a set that grows and you never make a retraction from that set in the face of new information.
So if I have a filter that given shoes, gives me the red shoes, as I learn more about shoes, I learn more about red shoes. Projection or transformation has the same property – if I want all the shoes and I want to paint them red, as I learned about new shoes, I paint more shoes red. Somewhat surprisingly, Join has the property of monotonicity too. If I am trying to pair up left and right shoes, as I learn about more left shoes and more right shoes, in the fullness of time, I get more pairs of shoes irrespective of the order in which the individual shoes arrive. Now this sort of emblematic example of a non-monotonic operator will be set-difference. Let’s say I am going through my closet and I want all the shoes that do not have a partner. As I get a shoe, when can I emit the shoe?
Exactly when I know there will never be a partner. So that implies some sort of barrier that needs to happen or I have to exhaust my input, but of course, in the closet example, I could just wait until I have exhausted my closet. In a streaming database system, your input is unbounded. So non-monotonicity, at some very high level, implies the need for synchronization. But as we developed this Bloom language, my colleague Neil Conway extended Bloom to be able to capture not just sets which have this lattice property due to set union, but arbitrary lattices like the numbers whose least upper bound is max and things like this. When he did that, he made a very direct connection to the CRDT work and he also argued, compellingly I believe, that the language approach is potentially much richer than an approach like CRDT’s.
Because CRDTs give you strong guarantees about the state convergence of individual objects, but our applications are composed out of many objects. And if you want a program-wide guarantee of something like eventual consistency, it is by no means obvious how to achieve it with CRDTs unless of course you build a CRDT that is your entire system. But that is burdensome. What you’d like to be able to do is to compose objects. So what Neil did was he defined individual lattices of different types that very much resembles CRDTs and then he defined the safe classes of morphisms from lattice to lattice that preserve monotonicity in the sense that if you go up in this lattice, your morphism into this other lattice also goes up. I know that took a lot of time, but that is sort of the connection I think between the monotonicity theory and the CRDT work.
Werner: Well, I think it’s worth the time. I think it is definitely the most interesting topic in research recently. Yes, I agree.
Oh, gosh. That is a good question. I mean to be honest – I mentioned the difficulty of selling a language. This may divert a little bit from your question, but I hope it will stay relevant. At the end of my PhD, for the last chapter of my PhD, two things occurred to me. The first was that it had been three years and nobody had adopted my toy language. I thought the world was going to adopt it and I would have all these case studies, right? So selling a language is hard and now that I am a young professor trying to get tenure, maybe it is not a good idea to stake my future in a boutique language.
The other thing that occurred to me is that we present this work and actually, I remember at a presentation about this work, Matei Zaharia who created Spark – he is now a professor at MIT – raising his hand and saying “You know, all this Bloom stuff has told me when in a distributed system I do not care about order” as though that is the only hard thing about distributed systems. But it seems to me that the hardest thing about distributed systems is partial failure. What does Bloom and languages like this that are all about commutativity have to say about failure? How do I know if I got a complete result?” That was really a mind-bender for me because I always knew partial failure was there, but we had always had this kind of flip answer that “Well, if your stuff is idempotent, you can just retry”.
That’s an easy thing to say, the CRDT people will probably say the same thing: “Well, if you are idempotent, retry is free. Retry all the time and eventually you will get it” But I did not feel like that answer was quite satisfactory and so for my last two years at Berkeley, I very much shifted gears and focused on the problem of providing stronger guarantees about fault tolerance. That is what lead to the work on Lineage-driven Fault Injection that I presented today with Kolton who has had some actual tech transfer over at Netflix [Editor’s note: the talk is on InfoQ: http://www.infoq.com/presentations/failure-test-research-netflix ]. One thing that was nice about that work is that first of all it rounded out the story so then my thesis can talk about how there’s these two things that are hard in distributed systems and I had given you a principle way to think about both and they are both based on relational languages.
The model that I used throughout my PhD at some sort of high level was to look back 30 years at the history of logic programming and to take this dusty, old logic programming, with all its theorems and apply it to a completely new domain, which was distributed systems and all these amazing things fell out. So monotonicity theory for asynchrony and then analysis of lineage or data provenance to reason about the implicit redundancy in systems and look for sort of weaknesses in that façade of redundancy that can go tell you about where it might be interesting to inject failures. That sort of rounded up the story.
So you asked me about the future. I do want to go back to the language question and to doing works similar to the CRDT work, composing CRDTs, but my current plan is to wait until after I get tenure. And for the next four years I want to keep sprinting on this Lineage-driven Fault Injection work and explore the new frontiers. I am currently writing a grant, asking for half a million dollars to fund four students for four years. So that is kind of the near term future for me – fault tolerance.
Yes, if you google for me, you can find me. You can also email me at firstname.lastname@example.org. If you are interested in more background about LDFI you can go to my website and read the original paper which was published in SIGMOD in 2015 – it is called “Lineage-driven Fault Injection”. I would also encourage you to read the Netflix tech blog [Editor’s note: http://techblog.netflix.com/2016/01/automated-failure-testing.html ] where we describe in a little bit more detail than we were able to do today the problems that arose, how we solved them and the results that we obtained in our collaboration. Kolton is an amazing guy to work with and we have a real kinship and so I am looking forward for opportunities to work with him in the future. He has of course left Netflix and is starting a company that is doing failure as a service. You should watch that company closely as well.
Werner: Excellent. Thank you, Peter. Thank you.
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Interview: Peter Alvaro on Distributed Programming, CRDTs, LDFI