神刀安全网

Paradigms of rust for the go developer

Paradigms of Rust for the Go developer

Reader Note : This article aims to provide some technical insight into the paradigm shifts I’ve been exposed to while researching and learning about the Rust programming language when it comes to concurrency. After spending 3.5 years well-vested in the Go programming language, this is my attempt to share these insights and I encourage you; the reader, to also explore these new insights that Rust provides. At the end of the day if you dismiss Rust as not for you, that’s ok but hopefully you can walk away with new concepts to think about.

Paradigms of rust for the go developer
It’s a good time to be a coder with languages like Rust and Go at our disposal

I am at an advantage over many software engineers working in the industry today. The advantage I have is that I recognize that programming languages are just tools . Tools that sometimes vary widely and sometimes overlap in features. The important part is to realize that some tools are better suited for a job than others. For example, you generally don’t want to write a AAA game title in pure Python. The reasons are obvious, Python does not give you the tight control over memory and speed to justify building the next Call of Duty. This is not to say it can’t be done but well you know…

W hat I observe in the industry today are software engineers that are fan-{boys, girls} with their languages. They say stuff like “My language is superior because it has generics” or “There is too much cruft in C++ for me to justify using it.” These fan-{boys, girls} engage in discussions and spend hours trying to convince each other that their language is “the best at everything” while the rest of us are building shit and getting things done . When you subscribe to the best practices and standards of a single language built around a single community, I find that you may be severely limiting your growth potential when it comes to learning new languages and more importantly new paradigms .

Hidden within the various languages that exist today, are a set of paradigms that can completely change the way you are used to thinking . Sometimes these paradigms are so focused and so specific to a language that they are only applicable in that particular language. Other times I find, and this is the great part; that you can take those paradigms and apply them to the languages you currently utilize . When that happens, congratulations, you’ve expanded your mind and your skill set and additionally you now have a fresh way of tackling stale old problems.

Now don’t get me wrong; you’ve banked your career on a couple of languages, you love these languages and you get shit done. There is nothing wrong with having admiration or love for a language that gives you the power to express yourself elegantly. We know that software engineering is both an art and a science. And a big part of the pride we have in software development is the power in knowing that we’ve solved a problem more elegantly, more performant and more robustly than the next gal or guy.

Additionally, the communities that grow around some languages are just as important as the languages themselves. Having a community to share your thoughts, challenge your ideas, and having the codebase to build on top of each other’s efforts can be paramount to tackling the problems you’re trying to solve.

Which brings us to the meat of this article. I’ve been spending the last 3.5 years fully vested in Go development. The Go Programming Language is a fantastic general purpose language for building performant highly-concurrent systems. I’ve heard it described as the language of “The Cloud” and while there are many languages today that have been utilized to build cloud solutions, Go brings a special kind of flair to the table.

By the way, the language is called Go not Golang. Golang is used as a more selective keyword to help us search the internets for Go related content.

Let’s talk about why one would want to utilize Go for tackling some problems:

  • You have the need to integrate with a body of code written in C
  • You need to leverage concurrency or parallelism for getting the most of your software
  • You want to work in a type-safe language
  • You want to work in an imperative language that compiles fast
  • You want to work in a distilled language that is specifically designed to reduce the cognitive over head required to read and write code
  • You want to leverage event-driven programming without the usual baggage: callback hell, deferreds, futures, promises, empty promises…or whatever the cool kids are calling them these days.

The items listed above are no accident. Go was designed specifically with those bullet-points in mind.

That list is a good list. But it only scratches the surface about Go and why you’d want to use it. There are more facets to this language that I’ll talk about. Go also has the concept of lightweight threading and coroutines and even borrows on a style of concurrent modeling called the CSP model better known as Communicating Sequential Processes . If you haven’t heard of these concepts go read about them, I’ll wait.

The CSP model brings about its own paradigm shift in how you build and model solutions with concurrency. If you are writing Go code today without understanding the CSP model I highly encourage you to learn about it.

This is what I’m getting at, if you are one of those persons who has heard of Go but never used it, knocks Go because it doesn’t have generics or simply went through the Go Tour for 20 minutes and realized the language sucks because function parameters are backwards …you are only hurting yourself. There are some wicked concepts behind Go that you should definitely take a look at even if the takeaway are just its paradigms. Also, it’s worth noting that Go borrows paradigms from past languages. Most languages do.

Now let’s get into some Rust and perhaps why’d you want to use Rust:

  • You need a language that offers both safety and control
  • You need a language that has practically no runtime overhead — you can’t afford GC pauses
  • You need high performance on par with C or C++
  • You want a language that won’t seg-fault in production
  • You want a language that is absolutely void of data-races
  • You want to work in a language that is free from baggage and influence for the last 30 years ( i’m looking at you C++ )

By the way, the language is called Rust …not Rustlang. Also, the bullet-points listed above are no accident either. Rust was specifically designed with these targets in mind.

This list is also a good list. It’s not by any means a complete list and Rust is an entirely different beast than Go. Rust additionally comes with an ownership and lifetime model that allows you to express code in such a way that the compiler can utilize to its advantage . This translates into removing an entire class of problems that typically plague lower-level/unsafe languages. Dangling pointers got you down? Use after free causing headaches? Double frees plaguing your data structure? Rust has your back, no… the Rust compiler has your back and this is absolutely cutting-edge stuff .

Side note: I was in the break room at my place of work a few days ago. I was talking to some colleagues about how Rust is growing in popularity and on the Most Loved, Dreaded, Wanted list on Stackoverflow.com for 2016. Someone said to me, “Ugh, [I] looked at Rust…it’s too complicated.” This person has some good experience with Go so what did I say back? I said, “Sure, it’s more complicated than Go I do agree, but what you pay for up front in writing Rust code, you get back 10-fold with some runtime guarantees that are extraordinarily impressive.” This reminded me of how I felt when I first read through Rust’s language spec a year and a half ago. Luckily a dear friend encouraged me to take another look.

So I promised this article would teach you about some paradigms of Rust for the Go developer. Let’s dive in.

Paradigm Shift : Rust doesn’t like data-races

This excerpt of code below is taken straight from Go’s documentation in regards to the race detector . A tool that is designed to catch data races but only if you have proper tests that exercise the concurrency of your code . The race detector is NOT bullet-proof . It will not give you false positives — if it found a data-race…it is a data-race. But… there is absolutely no guarantee that it will find all data-races. Furthermore the race detector must evaluate your program while it is running . Additionally, some teams writing production Go code today don’t even bother to wire it up into their continuous-integration system….¯/_(ツ)_/¯

The code below demonstrates how Go’s maps are not thread-safe by default . I see code written like this all the time. People play splish-splash with goroutines, channels, and shared state and then try to push such code into production and then wonder why it explodes horrifically.

func main() {
c := make(chan bool)
m := make(map[string]string)
go func() {
m["1"] = "a" // First conflicting access.
c <- true
}()
m["2"] = "b" // Second conflicting access.
<-c
for k, v := range m {
fmt.Println(k, v)
}
}

Playground Link

A quick run down of the code above goes like this:

  • A channel is created and used solely for the purpose of ensuring the main goroutine doesn’t exit until the anonymous goroutine finishes running.
  • A local map m was created and captured (closed over) in the anonymous goroutine where it is mutated .
  • That same local map is also mutated in the main thread.
  • No synchronization is happening to ensure the map is safely mutated across these two goroutines.
  • At the end, the results are printed out to the screen via a for-loop.
  • Those results could be completely correct or not. In fact, all bets are off — when you have a data-race the behavior is simply undefined .
  • Go will happily compile and run the code above

The astute reader will realize the data race above can be fixed via the smallest tweak to the code . Below, we synchronize the mutations by effectively introducing a happens-before case by reading on channel c before mutation on the main thread can occur.

func main() {
c := make(chan bool)
m := make(map[string]string)
go func() {
m["1"] = "a" <-- this will always happen first
c <- true
}()
<-c // Reading from the channel here synchronizes mutation
m["2"] = "b" <-- this will always happen after
for k, v := range m {
fmt.Println(k, v)
}
}

Playground Link

Rust on the other-hand won’t even compile such code. It basically says: “Faaaaaaaack You, I refuse to compile this shit.”, while it holds up two middle fingers to your face in plain site. And believe you me, when you are woken up at 3:00am to troubleshoot an insidious data-race that is super hard to track down this is where you are going to wish your compiler would have also told you to fuck off.

Let’s look at a close approximation of what the Go code above might look like in Rust:

use std::sync::mpsc::channel;
use std::collections::HashMap;
use std::thread;
fn main() {
let (c_tx, c_rx) = channel();
let mut m = HashMap::new();

thread::spawn(move || {
m.insert(“1”, “a”);
c_tx.send(true).unwrap();
});
m.insert(“2”, “b”);
c_rx.recv().unwrap();
for (k, v) in &m {
println!(“{}, {}”, k, v);
}
}

Playground Link

There are some very real differences here worth noting. Yes, at first glance it is a little more complicated. We have to import our thread, channel and HashMap namespaces accordingly. There are additional syntax semantics that are not present in the Go version. The channel construct actually requires us to deal with two halves, the sending side: c_tx and the receiving side: c_rx . And, we are not using a lightweight threading model (like Go’s code does) because thread::spawn will actually map 1:1 to a native OS thread. While the threading model is completely different, the principles we’re dealing with are exactly the same. Multiple threads are sharing access and mutating the same map and bad things will happen .

But by far the most profound aspect about the Rust code above is that it will not compile . This is not because the Rust compiler wants to fight with you. It’s not because Rust sucks. It’s because the folks at Mozilla have worked very hard to ensure that the code above never ever compiles and therefore never slips into production only to cause eventual chaos.

This is a complete paradigm shift from Go’s model that is worth understanding and considering. You mean to tell me that my program won’t even compile if a data-race exists? Yes! Sign me up please, that is a real mental-shift in thinking for building concurrent systems.

Sidenote : The code above in Rust form can be synchronized in a similar way just like we did with the Go version. The fix can be that trivial . But, for the sake of this article let’s take a dive into the depths of Rust to see how such a language protects us from this ever happening in the first place.

So what is the compiler telling us about the code above when it fails to compile? It actually vomits the error messaging below:

<anon>:14:5: 14:6 error: use of moved value: `m` [E0382]
<anon>:14 m.insert("2", "b");
^
<anon>:14:5: 14:6 help: see the detailed explanation for E0382
<anon>:9:19: 12:6 note: `m` moved into closure environment here because it has type `std::collections::hash::map::HashMap<&'static str, &'static str>`, which is non-copyable
<anon>: 9 thread::spawn(move || {
<anon>:10 m.insert("1", "a");
<anon>:11 c_tx.send(true).unwrap();
<anon>:12 });
<anon>:9:19: 12:6 help: perhaps you meant to use `clone()`?
<anon>:17:20: 17:21 error: use of moved value: `m` [E0382]
<anon>:17 for (k, v) in &m {
^
<anon>:17:20: 17:21 help: see the detailed explanation for E0382
<anon>:9:19: 12:6 note: `m` moved into closure environment here because it has type `std::collections::hash::map::HashMap<&'static str, &'static str>`, which is non-copyable
<anon>: 9 thread::spawn(move || {
<anon>:10 m.insert("1", "a");
<anon>:11 c_tx.send(true).unwrap();
<anon>:12 });
<anon>:9:19: 12:6 help: perhaps you meant to use `clone()`?
error: aborting due to 2 previous errors

OMG…wtf that is a lot to swallow. Yes, it’s more complicated and I’m going to gloss over some details but over time you will learn to grok such messages. In fact, the compiler messages are practically poetic , they tell a very vivid story about why your code does not compile due to the violations of Rust’s ownership and lifetime model. Rust tells us that ownership of map m has been transferred to the spawned thread. Therefore map m can no longer be used on line 14 otherwise that would have introduced a data-race.

Let’s discuss ownership the details of ownership and why we’re seeing mentions of a move in the error above. Rust enforces the concept of having one owner of data . Think of an owner as being a variable binding to some data . As the owner, you can temporarily lend out access to the data you own via a shared reference. Below outlines some principle axioms in regards to the ownership model.

Burn these into your head:

  • There is only ever one owner of data
  • Ownership can be transferred — a move
  • Owned values can be borrowed temporarily
  • Borrowing prevents moves from occurring

Gut-check-refresher: what is a data-race again?

  • 2 + threads accessing the same data (reading)
  • at least 1 is unsynchronized
  • at least 1 is writing

Rust just did us a solid and caught the perfect storm that could have happened in production. You owe the Rust compiler a beer. Additionally, the data-race was caught at compile-time — not at runtime . The compiler and language semantics working together, remove all the guess work about where data-races may be hiding and you didn’t have to write a single concurrent test to expose it .

Loaded question : so is Rust’s supremely verbose and almost story-like error messaging really more complicated or is trying to pin-point data-races in production due to an app exhibiting undefined behavior more complicated?

Paradigm shift : Shared memory in Rust is opt-in

Let’s now consider how Rust allows us to go about fixing the problems above by first addressing concerns over shared memory. Again, this is Rust’s ownership and lifetime model at work enforcing some very specific rules so that your code never does bad things in production . Let’s examine what opting into shared memory looks like for Rust with a slightly different example.

use std::sync::mpsc::channel;
use std::thread;
use std::collections::HashMap;
use std::sync::Arc;
fn main() {
let (tx, rx) = channel();
let mut m = HashMap::new();

m.insert(“a”, “1”);
m.insert(“b”, “2”);

let arc = Arc::new(m); // Tells Rust we're sharing state
for _ in 0..8 {
let tx = tx.clone();
let arc = arc.clone();
thread::spawn(move || {
let msg = format!(“Accessed {:?}”, arc);
tx.send(msg).unwrap();
});
}
drop(tx); // Effectively closes the channel
for data in rx {
println!(“{:?}”, data);
}
}

Playground Link

Here is a description of the code above:

  • The code above doesn’t suffer from any race-conditions currently. That is because a happens-before case occurs where all mutations to map m happen in the main thread before the threading logic is ever run .
  • We then wrap our map m with an Atomic Reference Count (Arc) wrapper. This is the key principle that allows us to opt-in to sharing memory therefore satisfying the Rust compiler .
  • Additionally, although we are sharing access to the map m in each thread, no mutation is occurring in the threaded part of the code .
  • You’ll also notice two clone operations occur . This is necessary because each thread will act as a producer to the single channel instance. Also, the Arc clone allows each thread to have the read-only handle to the map m bumping the reference count accordingly.
  • Also worth mentioning is that because we’re using an Atomic Reference Counted wrapper, Rust will ensure that as each thread goes out of scope , the reference count will be decremented and proper cleanup can occur when the last thread no longer needs access to the map.
  • Then in the threaded portion of our code, we actually are able to create a string from the contents of the map m and send accordingly through a channel. More importantly, the compiler knows that we are doing this safely across all threads because no mutation is occurring in the threaded code and because our Arc wrapper is ensuring that a thread will never see a dangling pointer.

Lastly, we indicate we’re done with the channel via the call to drop and loop over the receiving side of the channel printing the results. drop is the explicit method provided by the Drop trait and will be called when it’s lifetime expires too. A call to it will “free” the resource before the program ends.

Now that we’ve covered how you can handle sharing memory in Rust using an Arc wrapper what of the case where we need to share memory when mutation occurs . Let’s go back to our original Rust code above that was modeled after the Go version and tweak it such that it compiles.

use std::sync::mpsc::channel;
use std::collections::HashMap;
use std::thread;
use std::sync::{Arc, Mutex};
fn main() {
let (c_tx, c_rx) = channel();
let m = HashMap::new();

// Wrap m with a Mutex, wrap the mutex with Arc
let arc = Arc::new(Mutex::new(m));
let t_arc = arc.clone();
thread::spawn(move || {
let mut z = t_arc.lock().unwrap();
z.insert(“1”, “a”);
c_tx.send(true).unwrap();
});
// Extra scope is needed to avoid the deadlock
{
let mut x = arc.lock().unwrap();
x.insert(“2”, “b”);
}

c_rx.recv().unwrap();
for (k, v) in m.iter() {
println!(“{}, {}”, k, v);
}
}

Playground Link

Let’s break down the code above:

  • Notice that in this case we are sharing memory with map m in the main thread as well as the spawned thread therefore we reach for our Arc wrapper to opt-in to Rust’s requirements around shared memory.
  • Additionally, not only are we sharing memory but mutation is now occurring within the spawned thread as well as the main thread and this logic is now synchronized due to proper locking of a shared resource now occuring.
  • Rust’s lock facility handles unlocking when it falls out of scope automatically so we have to take an extra step to limit the scope of the main threads mutation to prevent a deadlock .
  • Now the code above is almost complete however still does not compile due to our next paradigm that follows.

Paradigm shift: Lock data not code

Let’s diverge for a minute. Both Go and Rust have mutexes offering the ability to protect the integrity of shared state. A key difference with Rust is that in Rust you lock data directly not code. At first this difference seems almost mundane but reveals a powerful concept.

Suppose I write some crafty code in Go that is very delicate and I am super careful, super disciplined to ensure all proper locking happens around a map. But then, someone decides to come along next week and add some contributions to this code. They happen to stumble upon the same map that contains some useful data they need. They then reference this map and loop over it getting at the data as needed. Everything is wonderful except for one very, very important thing. They failed to take a read-lock on the map . The implications of this are huge. These means they aren’t being good citizens-of-synchronization and are now at risk of reading dirty data that has not been synchronized properly.

In Rust this situation simply cannot occur. Thanks again to Rust’s ownership model since the map m has now moved into the Mutex guard, the Mutex is now fully responsible for that data and will only allow access to it when you take a lock().

The above code will not compile due to the following error:

<anon>:33:19: 33:20 error: use of moved value: `m` [E0382]
<anon>:33 for (k, v) in m.iter() {
^
<anon>:33:19: 33:20 help: see the detailed explanation for E0382
<anon>:11:35: 11:36 note: `m` moved here because it has type `std::collections::hash::map::HashMap<&'static str, &'static str>`, which is non-copyable
<anon>:11 let arc = Arc::new(Mutex::new(m));
^
error: aborting due to previous error

The fix is rather easy, since ownership moved into the Mutex we just need to make sure that we kindly go through the mutex for the last step in the code. Therefore the last few lines of code change from this:

for (k, v) in m.iter() {
println!(“{}, {}”, k, v);
}

To now this:

let a = arc.lock().unwrap();
for (k, v) in a.iter() {
println!(“{}, {}”, k, v);
}

Alas, for completion we have now finally arrived at this code below which compiles just fine.

use std::sync::mpsc::channel;
use std::collections::HashMap;
use std::thread;
use std::sync::{Arc, Mutex};
fn main() {
let (c_tx, c_rx) = channel();
let m = HashMap::new();
// Wrap m with a Mutex, wrap the mutex with Arc
let arc = Arc::new(Mutex::new(m));
let t_arc = arc.clone();
thread::spawn(move || {
let mut z = t_arc.lock().unwrap();
z.insert(“1”, “a”);
c_tx.send(true).unwrap();
});
// Extra scope is needed to avoid the deadlock
{
let mut x = arc.lock().unwrap();
x.insert(“2”, “b”);
}
c_rx.recv().unwrap();
let a = arc.lock().unwrap();
for (k, v) in a.iter() {
println!(“{}, {}”, k, v);
}
}

Playground Link

If you are still with me, we covered a lot of material to consider. These three paradigms really offer an alternative way of thinking:

  • Rust goes a long way to helping you avoid data-races
  • Rust requires you to opt-in to shared memory which removes many problems associated with sharing state
  • Rust locks data and not code , leveraging the ownership model to ensure proper synchronization is occurring at all times

This now concludes this blog post on the Paradigms of Rust for the Go developer . I wanted to be specifically clear and call out a few things. The examples in this essay are a bit contrived but are simple and useful to consider for the purpose of illustrating the mechanics in play.

Also, this post was not meant in any way to bash one language over another. In fact, the things I learned in my research with Rust I think will ultimately make me a better software engineer no matter what language I use. Go is where it’s at for me today due to the nature of the systems that I build for a living. The lightweight-threading model that Go brings affords me a significant advantage over a language like Rust currently. Conversely, I’m sure there will be a time to reach for a language like Rust. When that time comes, I’m sure I’ll have a blast working in such a powerful lower-level language that avoids the many pitfalls that typically plague such languages.

One more thing! If you enjoyed this article, I thrive on feedback. Please share it and follow me on Twitter . Also, if this article contains subject matter that excites you, come join me at SendGrid where we get to play with distributed systems all day. (There will be some real work involved too.)

And if you enjoyed this article see my post titled: Dancing with Go’s Mutexes

Happy coding!

@deckarep

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Paradigms of rust for the go developer

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址