神刀安全网

String types in Rust

This seems to be a common issue, so let’s talk about the different string types in the Rust programming language. In this post I’m going to explain the organization of Rust’s string types with String and str as examples, then get into the lesser-used string types— CString , CStr , OsString , OsStr , PathBuf , and Path —and how the Cow container can make working with Rust strings easier.

The most important thing to understand is that string types in Rust come in pairs, which I’ll call the “owned” sort and “slice” sort ( the term “sort” is being used here to mean roughly “set of types”, to make clear that “owned” and “slice” are not actual Rust types, but a notion of organization for Rust’s string types ). The “owned” sort of strings— String , CString , OsString , and PathBuf —have ownership over their contents (hence the name!), and can grow or shrink. The “slice” sort of strings— str , CStr , OsStr , and Path —are views into some collection of characters. There’s also the Cow wrapper type, which can make working with the two sorts of strings easier while retaining good performance characteristics.

“Owned” vs. “Slice” Sorts of Strings

String and str are, by far, the most common string types in Rust, so I will use them to illustrate the difference between the two sorts of string types. String (the “owned” sort of string type) is a wrapper for a heap-allocated buffer of unicode bytes. str (the “slice” sort of string type) is a buffer of unicode bytes that may be on the stack, on the heap, or in the program memory itself.

When you create a string literal in Rust, it is by default of type &str . There are three things this could mean, depending on how that buffer was created.

“Slice” on the Heap

If the reference is taken from a String , it will be a reference to the contents of the String ’s internal buffer, which is on the heap. Handing out these references is a common pattern for effectively using “owned” sort strings in Rust. For convenience’s sake, all references to “owned” sort strings coerce to references to their “slice” sort equivalent. That is, &String becomes &str . It is considered good practice to use the latter type for function parameters taking a reference to a string.

“Slice” in Program Memory

If the buffer is a string literal, with or without an explicit lifetime, then the buffer will be located in program memory. In fact, all string literals have the 'static lifetime by default, meaning they can be safely referenced from anywhere in the program. Note though that having a buffer with a 'static lifetime is not the same as declaring a static variable. A buffer with a 'static lifetime may be safely referenced from anywhere, but must still be passed to functions explicitly; it is not globally visible. A variable must be declared with the static keyword to be globally visible.

There is some nuance here though. When returning a &str from a function, the lifetime of the reference must be tied to either a single input lifetime, or a method taking either &self or &mut self . In the first case, the lifetime of the returned reference will be the lifetime of the single input reference. In the second case, the lifetime will be the lifetime of the reference to self .

If your function does not fit either of these cases, rustc will return an error, as in the following example:

fn get_string() -> &str {     "A string!" }  fn main() {     let string = get_string();     println!("{}", string); } 

You can avoid the lifetime elision problems entirely by explicitly annotating the function in one of the following ways:

fn get_string_static() -> &'static str {     "A string!" }  fn get_string_function_call<'a>() -> &'a str {     "A string!" }  fn main() {     let string_1 = get_string_static();     let string_2 = get_string_function_call();      println!("{}", string_1);     println!("{}", string_2); } 

So what do you do if you want to return a string from a function without giving it a 'static lifetime? You use the owned sort of string type, in this case: String .

Why “Owned” Strings Exist

The “owned” sorts of strings are allocated on the heap, and do not suffer the same limitations of the “slice” sorts. They may be freely moved around without safety issues, handing out slices of their internal buffer as needed. The “slice” sorts do not have the same guarantees, and so can’t be used as freely.

If you want to avoid dealing with ownership and borrowing for strings, you can just turn every “slice” sort into an “owned” sort via the to_owned() method (provided by the ToOwned trait, which all “owned” sorts of string types implement), and only w rk with the “owned” sorts, but doing so would incur performance penalties when you unecessarily allocate string buffers on the heap, and would be considered poor Rust style. Instead, it’s best to develop an understanding of when the two sorts of strings are needed.

Another important thing to note is that because the “owned” sorts of strings abstract away the underlying buffer, they can grow or shrink, possibly allocating a new underlying buffer and copying their contents to this new buffer. The “slice” sorts of strings cannot be resized, as they may not even be on the heap.

The “slice” sort strings can only be accessed via what’s called a “fat pointer.” This is because slices are “dynamically-sized types,” meaning they do not carry information about their own length. They are simply some collection of contiguous memory. A “fat pointer” to a slice stores both a pointer to the memory in question and the length of the data stored at that memory location. This is all handled automatically by Rust, but it means that the “slice” sort of strings are interacted with via references, rather than being handled directly. For more detail about dynamically-sized types, check out “The Rustonomicon,” which covers them in detail .

The String Type Pairs

The pairs of string types are differentiated from each other by the guarantees provided by the underlying buffer.

String and str
String and str are guaranteed to be valid UTF-8 encoded Unicode strings. If you’re wondering why UTF-8 is the standard encoding for Rust strings, check out the Rust FAQ’s answer to that question .
CString and CStr
CString and CStr are guaranteed to be compatible with C strings, and are usually used in FFI code.
OsString and OsStr
OsString and OsStr are guaranteed to be platform-native strings (that is, they use the encoding of the current platform) that can be cheaply converted into String and str types. They are usually used when interacting directly with the operating system.
PathBuf and Path
PathBuf and Path are wrappers around OsString and OsStr that provide convenient methods for operating on paths according to the rules of the current system. They are usually used when interacting with system paths.

Cow

There are times when working with strings in Rust that you want to cleanly abstract over the two sorts of string types. Maybe you want a container that may hold either “owned” sort or “slice” sort strings, or you want to use slices except for cases where an “owned” sort is absolutely necessary. In these situations, use Cow .

Cow (which stands for “Clone on Write”) is a container that can take in a “slice” sort of string type, and only convert that “slice” sort into an “owned” sort when absolutely necessary (when something attempts to write to the slice). This has the advantage of reducing the complexity of reasoning about ownership, while also helping keep unecessary allocations down. It is not a panacea, but it can be very helpful sometimes.

Conclusion

That was a relatively quick explanation of string types in Rust. Hopefully that helped to clarify a bit of why the different string types exist, and why they are useful. As a final reference, here is a table of the different types:

Guarantees “Slice” sort “Owned” sort
UTF-8 str String
C-compatible CStr CString
OS-compatible OsStr OsString
System path Path PathBuf

Updated (3/28/2016): This post has been updated based on corrections from CryZe92 and Ilogiq on the Rust subreddit. Thanks to both of them for the help!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » String types in Rust

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮