Add analyzer and fix to escape non-ASCII literal values#450
Add analyzer and fix to escape non-ASCII literal values#450j3parker wants to merge 1 commit intoBrightspace:mainfrom
Conversation
|
|
||
| var newDoc = doc.WithSyntaxRoot( newRoot ); | ||
|
|
||
| return Task.FromResult( newDoc ); |
There was a problem hiding this comment.
string foo = "αβγ";-->
string foo = /* unencoded: "αβγ" */ "\u03B1\u03B2\u03B3";The comment could get out of sync. Options:
- Don't do this
- Write an analyzer to verify they are in sync (not hard, but probably not worth it.)
- It's fine, it's not likely to get through code review
Thoughts?
There was a problem hiding this comment.
There is no chance that I'll notice an out-of-sync comment in a review. I'd lean towards #1 unless there lots of instances where the specific value is user-facing (seems unlikely since user-facing values should be langTerms, not string constants).
Alternately, is there a visualizer that would work here? E.g. it'd be handy if VS could overlay the "rendered" value of the string.
| description: "The parameter {0} has a default value of {1} here, but {2} in its original definition in {3}. This causes inconsistent behaviour. Please use the same defualt value everywhere." | ||
| ); | ||
|
|
||
| public static readonly DiagnosticDescriptor EscapeNonAsciiCharsInLiteral = new DiagnosticDescriptor( |
There was a problem hiding this comment.
Consider providing a description of (or link to) what can go wrong without this.
| var literalExpr = (LiteralExpressionSyntax)ctx.Node; | ||
|
|
||
| // We can't handle verbatim strings for the same reason as these | ||
| // folks: https://github.com/dotnet/codeformatter/issues/39 (TODO) |
There was a problem hiding this comment.
I found this quite interesting.
| var copyStartIdx = 0; | ||
|
|
||
| // invariant: copyStartIdx < idx | ||
| // Note: the enclosing quotes are included in val; don't bother looking at them. |
There was a problem hiding this comment.
Thanks, variable rename
| // invariant: copyStartIdx < idx | ||
| // Note: the enclosing quotes are included in val; don't bother looking at them. | ||
| for( int idx = 1; idx < token.Length - 1; idx++ ) { | ||
| if( token[idx] < 0x80 ) { |
|
|
||
| // copy all the ascii chars we've seen between the last copy | ||
| // and now (not inclusive) into sb | ||
| sb.Append( token, copyStartIdx, idx - copyStartIdx ); |
There was a problem hiding this comment.
This seems more complicated than copying char-at-a-time, and I'm not sure I see the advantage given that you're using StringBuffer.
There was a problem hiding this comment.
It's "basically a no-op" when the string is ASCII. Microsoft did it with two passes (one to detect nonASCIIness, one to do the copying) which did char-by-char copying in the second step. It's technically more work to do that (two passes, and char by char is still worse than chunks at a time) but its so neglible and only happens during errors anyway so really not interesting.
Honestly I just wrote it this way because it came out of my head this way. I'll look at it again.
| return false; | ||
| } | ||
|
|
||
| private static bool IsSurrogatePair( string str, int idx ) { |
There was a problem hiding this comment.
Why not use char.IsSurrogatePair?
| /* EscapeNonAsciiCharsInLiteral(string,"\u284C\u2801\u2827\u2811 \u283C\u2801\u2812 \u284D\u281C\u2807\u2811\u2839\u2830\u280E \u2863\u2815\u280C") */ "⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌" /**/; | ||
|
|
||
| // This one hits the branch for surrogate pairs | ||
| const string MormonTwinkleTwinkleLittleStar = |
No description provided.