Function utf16_to_byte_offset

Source

pub fn utf16_to_byte_offset(s: &str, utf16_offset: u32) -> Option<usize>

Expand description

Converts UTF-16 offset to byte offset in a string.

LSP uses UTF-16 code units for character positions (for compatibility with JavaScript and other languages). This function converts from UTF-16 offset to byte offset for Rust string indexing.

§Arguments

s - The string to index into
utf16_offset - UTF-16 code unit offset (from LSP Position.character)

§Returns

Byte offset if valid, None if the UTF-16 offset is out of bounds.

§Examples

// ASCII: UTF-16 offset equals byte offset
assert_eq!(utf16_to_byte_offset("hello", 2), Some(2));

// Unicode: "日本語" - each char is 3 bytes but 1 UTF-16 code unit
assert_eq!(utf16_to_byte_offset("日本語", 0), Some(0));
assert_eq!(utf16_to_byte_offset("日本語", 1), Some(3));
assert_eq!(utf16_to_byte_offset("日本語", 2), Some(6));

// Emoji: "😀" is 4 bytes but 2 UTF-16 code units (surrogate pair)
assert_eq!(utf16_to_byte_offset("😀test", 2), Some(4));

utf16_to_byte_offset

Function utf16_to_byte_offset Copy item path

§Arguments

§Returns

§Examples

Function utf16_to_byte_offset