Ruta graveolens  ·  notes from a language experiment  ·  cultivated since 2025

Tokens

Tokens are the atomic units of syntax in a Rue program. The lexer processes source text and produces a sequence of tokens.

Token Categories

Rue tokens fall into the following categories:

CategoryExamples
Keywordsfn, let, mut, if, else, while, match, return, break, continue, true, false
Identifiersmain, x, my_var, _unused
Integer literals0, 42, 1_000_000, 0xFF, 0o17, 0b1010
String literals"hello", "world", "with \"escapes\""
Operators+, -, *, /, %, ==, !=, <, >, <=, >=, &&, ||, !, &, |, ^, ~, <<, >>
Delimiters(, ), {, }, [, ], ,, ;, :, ->, =>

Integer Literals

An integer literal is a decimal literal, a hexadecimal literal (prefix 0x), an octal literal (prefix 0o), or a binary literal (prefix 0b). A decimal literal begins with a decimal digit; a based literal begins with its lowercase base prefix and contains at least one digit of that base. Hexadecimal digits are case-insensitive: 0xff, 0xFF, and 0xfF denote the same value.

integer_literal = dec_literal | hex_literal | oct_literal | bin_literal ;
dec_literal = dec_digit { dec_digit | "_" } ;
hex_literal = "0x" { hex_digit | "_" } ;   (* at least one hex_digit *)
oct_literal = "0o" { oct_digit | "_" } ;   (* at least one oct_digit *)
bin_literal = "0b" { bin_digit | "_" } ;   (* at least one bin_digit *)
dec_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
oct_digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" ;
bin_digit = "0" | "1" ;
hex_digit = dec_digit | "a" | ... | "f" | "A" | ... | "F" ;

Integer literals MUST be representable in their target type. An unadorned integer literal defaults to type i32.

Underscores (_) may appear as digit separators anywhere among the digits of an integer literal — including immediately after a base prefix and trailing — and have no effect on the literal's value. An integer literal cannot begin with an underscore: a token such as _1 is an identifier, not a literal.

A base prefix with no digits after it (e.g. 0x, 0b_) is a compile-time error.

A digit that is not valid in the literal's base (e.g. 0b2, 0o9, 0xG) is a compile-time error.

Base prefixes are lowercase. An uppercase base prefix (0X, 0O, 0B) is a compile-time error.

fn main() -> i32 {
    0            // zero
    42           // decimal integer
    255          // maximum u8 value
    1_000_000    // underscore separators
    0xFF         // hexadecimal, value 255
    0x_FF_       // underscores legal after the prefix and trailing
    0o17         // octal, value 15
    0b1010       // binary, value 10
}

String Literals

A string literal is a sequence of characters enclosed in double quotes (").

string_literal = '"' { string_char } '"' ;
string_char = any_char_except_quote_or_backslash | escape_sequence ;
escape_sequence = "\\" | "\"" | "\n" | "\t" | "\r" | "\0" ;

String literals support the following escape sequences:

EscapeCharacter
\\Backslash
\"Double quote
\nNewline (line feed, U+000A)
\tHorizontal tab (U+0009)
\rCarriage return (U+000D)
\0Null character (U+0000)

An invalid escape sequence in a string literal is a compile-time error.

An unterminated string literal (reaching end-of-file or end-of-line without a closing quote) is a compile-time error.

fn main() -> i32 {
    let a = "hello world";
    let b = "with \"quotes\"";
    let c = "with \\ backslash";
    let d = "line1\nline2";   // newline
    let e = "col1\tcol2";     // tab
    0
}

Identifiers

An identifier starts with a letter or underscore, followed by any number of letters, digits, or underscores.

identifier = (letter | "_") { letter | digit | "_" } ;
letter = "a" | ... | "z" | "A" | ... | "Z" ;

Identifiers cannot be keywords.

Underscore Identifier

The identifier _ (single underscore) is a wildcard that discards its value without creating a binding. When used in a let statement, the initializer expression is evaluated for its side effects, but no variable is created and no storage is allocated.

A reference to _ as an expression is a compile-time error. The wildcard identifier cannot be used to retrieve a previously discarded value.

Multiple occurrences of _ are permitted in the same scope. Each occurrence independently discards its value.

fn main() -> i32 {
    let _ = 42;       // discards 42, no binding created
    let _ = 100;      // discards 100, no conflict with previous _
    0
}

Underscore-Prefixed Identifiers

An identifier that begins with an underscore followed by one or more characters (e.g., _unused, _x) is a normal identifier that creates a binding. Such identifiers suppress unused variable warnings but can otherwise be used like any other identifier.

fn main() -> i32 {
    let x = 1;
    let my_variable = 2;
    let _unused = 3;      // suppresses unused warning, but is a normal variable
    let x1 = 4;
    x + my_variable + _unused + x1
}