pub fn utf16_to_byte_offset(s: &str, utf16_offset: u32) -> Option<usize>Expand description
Converts UTF-16 offset to byte offset in a string.
LSP uses UTF-16 code units for character positions (for compatibility with JavaScript and other languages). This function converts from UTF-16 offset to byte offset for Rust string indexing.
§Arguments
s- The string to index intoutf16_offset- UTF-16 code unit offset (from LSP Position.character)
§Returns
Byte offset if valid, None if the UTF-16 offset is out of bounds.
§Examples
// ASCII: UTF-16 offset equals byte offset
assert_eq!(utf16_to_byte_offset("hello", 2), Some(2));
// Unicode: "日本語" - each char is 3 bytes but 1 UTF-16 code unit
assert_eq!(utf16_to_byte_offset("日本語", 0), Some(0));
assert_eq!(utf16_to_byte_offset("日本語", 1), Some(3));
assert_eq!(utf16_to_byte_offset("日本語", 2), Some(6));
// Emoji: "😀" is 4 bytes but 2 UTF-16 code units (surrogate pair)
assert_eq!(utf16_to_byte_offset("😀test", 2), Some(4));