Skip to content

Conversation

@tomekb234
Copy link

@tomekb234 tomekb234 commented Oct 28, 2025

This library immediately converts &str input to &[u8] with as_bytes() and then does not appear to assume that the input is UTF-8 encoded, in the sense that non-UTF-8 input does not cause any undefined behavior, panics, or an invalid generated DOM. Since there is no parse() variant with &[u8] input, given such an input to parse, it is necessary to awkwardly use the unsafe str::from_utf8_unchecked to avoid unnecessary UTF-8 check overhead.

This pull request:

  • changes the type of the input parameter of Parser::new from &str to &[u8],
  • changes the type of the input parameter of VDomGuard::new from String to Vec<u8> (note that as String is backed with Vec<u8>, conversion from the former to the latter is cheap),
  • changes the type of the pointer stored in VDomGuard from *mut str to *mut [u8],
  • adds new public parse_bytes() and parse_bytes_owned() functions,
  • adds a test for non-UTF-8 input.

Fixes #61.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bytes input instead of string input

1 participant