String Type
The type String represents an immutable sequence of UTF-8 encoded bytes.
A String value is a fat pointer consisting of a pointer to the string data and the length in bytes.
String literals are stored in read-only memory and have static lifetime.
fn main() -> i32 {
let s = "hello";
0
}
String Literals
A string literal is a sequence of characters enclosed in double quotes (").
String literals support the following escape sequences:
| Escape | Meaning |
|---|---|
\\ | Backslash |
\" | Double quote |
\n | Newline (line feed, U+000A) |
\t | Horizontal tab (U+0009) |
\r | Carriage return (U+000D) |
\0 | Null character (U+0000) |
An invalid escape sequence in a string literal is a compile-time error.
fn main() -> i32 {
let a = "hello world";
let b = "with \"quotes\"";
let c = "with \\ backslash";
let d = "line1\nline2"; // newline
let e = "col1\tcol2"; // tab
0
}
String Equality
Strings support the equality operators == and !=.
Two strings are equal if they have the same length and identical byte content.
fn main() -> i32 {
let a = "hello";
let b = "hello";
let c = "world";
if a == b && a != c {
0
} else {
1
}
}
String Debugging
The @dbg intrinsic accepts a String argument and prints its content followed by a newline.
fn main() -> i32 {
let msg = "Hello, world!";
@dbg(msg);
0
}
Byte Access
A String is a byte string: its contents are conventionally UTF-8 but are not required to be valid UTF-8 (see ADR-0035). Byte access therefore operates on the raw bytes and never inspects UTF-8 character boundaries.
Indexing a String with an integer, s[i], evaluates to the byte at byte offset i as a value of type u8. The operation is O(1).
If the index i is greater than or equal to s.len(), evaluating s[i] traps (index out of bounds), terminating the program the same way an out-of-bounds array index does.
fn main() -> i32 {
let s = "café"; // 5 bytes: 'c' 'a' 'f' 0xC3 0xA9
@dbg(s[0]); // 99 ('c')
@dbg(s[3]); // 195 (0xC3)
@dbg(s[4]); // 169 (0xA9)
0
}
The method s.substring(start, len) returns a new String containing the byte range [start, start + len) copied from s. Because String is a byte string, any byte range is permitted; the range need not fall on UTF-8 character boundaries. The receiver s is borrowed, not consumed.
If start + len is greater than s.len() (or the addition overflows), s.substring(start, len) traps (index out of bounds).
fn main() -> i32 {
let s = "café";
let tail = s.substring(3, 2); // the two bytes of 'é'
@dbg(tail.len()); // 2
0
}
Integer Formatting
The intrinsic @to_string(n) takes an i64 and returns a new, heap-allocated String containing the base-10 decimal representation of n (see ADR-0035). The argument is of type i64; a bare integer literal argument is inferred to be i64.
@to_string(n) formats the entire range of i64, including i64::MIN. A negative value is prefixed with a single -; a zero value formats as 0.
fn main() -> i32 {
@dbg(@to_string(42)); // 42
@dbg(@to_string(-5)); // -5
0
}
Concatenation
When both operands of the + operator are String, s1 + s2 evaluates to a new, heap-allocated String whose bytes are the bytes of s1 followed by the bytes of s2 (see ADR-0035). Both operands are borrowed, not consumed, and remain usable afterwards.
The + operator requires both operands to have the same type. Mixing a String and an integer (for example s + 1) is a type error; there is no implicit conversion between String and integers.
fn main() -> i32 {
let greeting = "Hello, " + "world!";
@dbg(greeting); // Hello, world!
0
}
Output
The free function print(s) takes a String and writes its raw bytes to standard output, adding nothing. Unlike @dbg, it does not append a newline and does not apply any debug formatting. The argument s is borrowed, not consumed, and remains usable afterwards.
The free function println(s) takes a String and writes its raw bytes to standard output followed by a single newline (U+000A). The argument s is borrowed, not consumed. Together with @to_string and +, println composes line-oriented output; there is no formatting or interpolation syntax.
print(s) and println(s) write exactly the bytes of s, in order, without transformation: because a String is a byte string, the output is byte-for-byte identical to the string's contents (the only difference between the two is the single trailing newline println adds). Writing an empty String writes no bytes (for print) or a lone newline (for println).
fn main() -> i32 {
print("hello"); // hello
print(" world"); // hello world (no newline yet)
println(""); // hello world\n
println("value is " + @to_string(42)); // value is 42\n
0
}
Search
The method s.contains(needle) returns true if and only if the bytes of the String needle occur as a contiguous subsequence of the bytes of s. The comparison is byte-level and does not inspect UTF-8 character boundaries. The empty needle is contained in every string. The receiver s is borrowed, not consumed.
The method s.starts_with(prefix) returns true if and only if the bytes of the String prefix are a prefix of the bytes of s. The comparison is byte-level. The empty prefix matches every string. The receiver s is borrowed, not consumed.
fn main() -> i32 {
let h = "hello";
@dbg(h.contains("ell")); // true
@dbg(h.starts_with("he")); // true
@dbg(h.starts_with("lo")); // false
0
}
Character Iteration
The character view s.chars() yields the Unicode scalar values of a String, decoding its bytes as UTF-8. It is used as the iterable of a for loop (see Loop Expressions; a preview feature, --preview for_loops), which binds each scalar value as a u32 in ascending byte order.
Decoding through s.chars() is strict: a byte sequence that is not well-formed UTF-8 (an ill-formed, truncated, overlong, or surrogate sequence) traps at runtime when it is decoded. Because a String is a byte string that may hold arbitrary bytes, this "trap, don't corrupt" behavior at the decode boundary is where invalidity is caught.
The lossy character view s.chars_lossy() yields the same Unicode scalar values as s.chars() for well-formed UTF-8, but instead of trapping it substitutes the Unicode replacement scalar U+FFFD (decimal 65533) for each maximal subpart of an ill-formed subsequence and continues. Lossiness is explicit: chars_lossy is the only way to decode without trapping, so silent corruption is never the default. Like chars, it is used as the iterable of a for loop and binds each scalar value as a u32.
fn main() -> i32 {
let s = "café";
let mut count = 0;
for c in s.chars() {
@dbg(c); // 99, 97, 102, 233 (the last is é = U+00E9)
count = count + 1;
}
count // 4 scalar values (though the string is 5 bytes)
}
Limitations
The current implementation does not support:
- Slicing with range syntax (
s[a..b]); uses.substring(start, len)instead - Pattern matching on strings
These features may be added in future versions.