Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request adds support for 8-bit ECMA-48 control sequences by introducing a new ControlSequences8Bit option. This complements the existing ControlSequences option (which handles 7-bit sequences) and leverages the recently added 8-bit control sequence support in the uax29 v2.7.0 library.
Changes:
- Added
ControlSequences8Bitfield to theOptionsstruct to enable zero-width treatment of 8-bit C1 control sequences (0x80-0x9F) - Updated width calculation logic to treat graphemes starting with C1 control bytes as zero-width when the option is enabled
- Enhanced truncate functions to preserve both 7-bit and 8-bit trailing escape sequences based on the respective options
- Added comprehensive test coverage for 8-bit sequences, independence of 7-bit/8-bit options, and mixed scenarios
- Created separate
options.gofile to consolidate option definitions (refactoring) - Created separate
truncate.gofile for truncation functions (refactoring)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| options.go | New file containing the Options struct with the new ControlSequences8Bit field and DefaultOptions |
| width.go | Added C1 control byte detection for zero-width treatment; removed Options definition and truncate functions (moved to separate files) |
| truncate.go | New file containing truncation functions; updated to handle both 7-bit and 8-bit escape sequences with the new isEscapeLeader helper |
| graphemes.go | Updated to configure the grapheme iterator with AnsiEscapeSequences8Bit option |
| width_test.go | Added comprehensive tests for 8-bit sequences, independence tests, and regression test for truncation edge case |
| fuzz_test.go | Added 8-bit control sequences to fuzz test seeds and option combinations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Confusing at best, incorrect / dangerous at worst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
uax29added an option for 8-bit control sequences. This PR adds support for that option in width calculations.It turns out that
Truncatewith 8-bit escapes can lead to confusing outcomes, by concatenating bytes that, when combined, form a new UTF-8 codepoint. That’s bad. This was discovered by fuzzing. I’ve decided to haveTruncateonly respect the 7-bit option (which was the status quo). Better for the package to have a limitation than confusion.