Ruta graveolens  ·  notes from a language experiment  ·  cultivated since 2025

String Type

The type String represents an immutable sequence of UTF-8 encoded bytes.

A String value is a fat pointer consisting of a pointer to the string data and the length in bytes.

String literals are stored in read-only memory and have static lifetime.

fn main() -> i32 {
    let s = "hello";
    0
}

String Literals

A string literal is a sequence of characters enclosed in double quotes (").

String literals support the following escape sequences:

EscapeMeaning
\\Backslash
\"Double quote
\nNewline (line feed, U+000A)
\tHorizontal tab (U+0009)
\rCarriage return (U+000D)
\0Null character (U+0000)

An invalid escape sequence in a string literal is a compile-time error.

fn main() -> i32 {
    let a = "hello world";
    let b = "with \"quotes\"";
    let c = "with \\ backslash";
    let d = "line1\nline2";   // newline
    let e = "col1\tcol2";     // tab
    0
}

String Equality

Strings support the equality operators == and !=.

Two strings are equal if they have the same length and identical byte content.

fn main() -> i32 {
    let a = "hello";
    let b = "hello";
    let c = "world";
    if a == b && a != c {
        0
    } else {
        1
    }
}

String Debugging

The @dbg intrinsic accepts a String argument and prints its content followed by a newline.

fn main() -> i32 {
    let msg = "Hello, world!";
    @dbg(msg);
    0
}

Byte Access

A String is a byte string: its contents are conventionally UTF-8 but are not required to be valid UTF-8 (see ADR-0035). Byte access therefore operates on the raw bytes and never inspects UTF-8 character boundaries.

Indexing a String with an integer, s[i], evaluates to the byte at byte offset i as a value of type u8. The operation is O(1).

If the index i is greater than or equal to s.len(), evaluating s[i] traps (index out of bounds), terminating the program the same way an out-of-bounds array index does.

fn main() -> i32 {
    let s = "café";   // 5 bytes: 'c' 'a' 'f' 0xC3 0xA9
    @dbg(s[0]);        // 99  ('c')
    @dbg(s[3]);        // 195 (0xC3)
    @dbg(s[4]);        // 169 (0xA9)
    0
}

The method s.substring(start, len) returns a new String containing the byte range [start, start + len) copied from s. Because String is a byte string, any byte range is permitted; the range need not fall on UTF-8 character boundaries. The receiver s is borrowed, not consumed.

If start + len is greater than s.len() (or the addition overflows), s.substring(start, len) traps (index out of bounds).

fn main() -> i32 {
    let s = "café";
    let tail = s.substring(3, 2);   // the two bytes of 'é'
    @dbg(tail.len());               // 2
    0
}

Integer Formatting

The intrinsic @to_string(n) takes an i64 and returns a new, heap-allocated String containing the base-10 decimal representation of n (see ADR-0035). The argument is of type i64; a bare integer literal argument is inferred to be i64.

@to_string(n) formats the entire range of i64, including i64::MIN. A negative value is prefixed with a single -; a zero value formats as 0.

fn main() -> i32 {
    @dbg(@to_string(42));    // 42
    @dbg(@to_string(-5));    // -5
    0
}

Concatenation

When both operands of the + operator are String, s1 + s2 evaluates to a new, heap-allocated String whose bytes are the bytes of s1 followed by the bytes of s2 (see ADR-0035). Both operands are borrowed, not consumed, and remain usable afterwards.

The + operator requires both operands to have the same type. Mixing a String and an integer (for example s + 1) is a type error; there is no implicit conversion between String and integers.

fn main() -> i32 {
    let greeting = "Hello, " + "world!";
    @dbg(greeting);   // Hello, world!
    0
}

Output

The free function print(s) takes a String and writes its raw bytes to standard output, adding nothing. Unlike @dbg, it does not append a newline and does not apply any debug formatting. The argument s is borrowed, not consumed, and remains usable afterwards.

The free function println(s) takes a String and writes its raw bytes to standard output followed by a single newline (U+000A). The argument s is borrowed, not consumed. Together with @to_string and +, println composes line-oriented output; there is no formatting or interpolation syntax.

print(s) and println(s) write exactly the bytes of s, in order, without transformation: because a String is a byte string, the output is byte-for-byte identical to the string's contents (the only difference between the two is the single trailing newline println adds). Writing an empty String writes no bytes (for print) or a lone newline (for println).

fn main() -> i32 {
    print("hello");                       // hello
    print(" world");                      // hello world   (no newline yet)
    println("");                          // hello world\n
    println("value is " + @to_string(42)); // value is 42\n
    0
}

The method s.contains(needle) returns true if and only if the bytes of the String needle occur as a contiguous subsequence of the bytes of s. The comparison is byte-level and does not inspect UTF-8 character boundaries. The empty needle is contained in every string. The receiver s is borrowed, not consumed.

The method s.starts_with(prefix) returns true if and only if the bytes of the String prefix are a prefix of the bytes of s. The comparison is byte-level. The empty prefix matches every string. The receiver s is borrowed, not consumed.

fn main() -> i32 {
    let h = "hello";
    @dbg(h.contains("ell"));      // true
    @dbg(h.starts_with("he"));    // true
    @dbg(h.starts_with("lo"));    // false
    0
}

Character Iteration

The character view s.chars() yields the Unicode scalar values of a String, decoding its bytes as UTF-8. It is used as the iterable of a for loop (see Loop Expressions; a preview feature, --preview for_loops), which binds each scalar value as a u32 in ascending byte order.

Decoding through s.chars() is strict: a byte sequence that is not well-formed UTF-8 (an ill-formed, truncated, overlong, or surrogate sequence) traps at runtime when it is decoded. Because a String is a byte string that may hold arbitrary bytes, this "trap, don't corrupt" behavior at the decode boundary is where invalidity is caught.

The lossy character view s.chars_lossy() yields the same Unicode scalar values as s.chars() for well-formed UTF-8, but instead of trapping it substitutes the Unicode replacement scalar U+FFFD (decimal 65533) for each maximal subpart of an ill-formed subsequence and continues. Lossiness is explicit: chars_lossy is the only way to decode without trapping, so silent corruption is never the default. Like chars, it is used as the iterable of a for loop and binds each scalar value as a u32.

fn main() -> i32 {
    let s = "café";
    let mut count = 0;
    for c in s.chars() {
        @dbg(c);          // 99, 97, 102, 233 (the last is é = U+00E9)
        count = count + 1;
    }
    count  // 4 scalar values (though the string is 5 bytes)
}

Limitations

The current implementation does not support:

  • Slicing with range syntax (s[a..b]); use s.substring(start, len) instead
  • Pattern matching on strings

These features may be added in future versions.