ensure upper() doesn't increase string length#9
Conversation
|
Is this good to review? |
|
Looks good, nice catch! Can you add a link to this PR as a comment in the function? And can similar stuff happen for |
|
Hm, this isn't really enough to match stdlib's re behavior: I am currently trying to figure out if python exposes case folding information somewhere that I could use. |
I can make that change. And yes there is a single character for which this is true for |
|
Actually, don't worry about it, I will implement this in some other way, thank you for making me aware of this issue, and thank you for providing a fix! I will properly implement the official unicode |
|
Thanks so much for ensuring there is a proper fix @MegaIng!! |
Fixes dottxt-ai/outlines#773
Problem
In
master, interegular useschar.upper(), which can convert one char into two, resulting the set ofaccepts()andstrings()being inconsistent.accepts()andstrings()inconsistency:This change ensures if a capitalized character isn't of length 1, the original character is used.
This is consistent with the behavior of
re: