Conversation
lexers/embedded/robot.xml
Outdated
| <rule pattern="#.*$"> | ||
| <token type="Comment"/> | ||
| </rule> | ||
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}"> |
There was a problem hiding this comment.
I noticed this ampersand and the other one below were causing the lexer not to be loaded. It seems to work as <rule pattern="${[^}]+}|@{[^}]+}|&{[^}]+}">
lexers/embedded/robot.xml
Outdated
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}"> | ||
| <token type="NameVariable"/> | ||
| </rule> | ||
| <rule pattern="(True|False|None|null|on|off)\b"> |
There was a problem hiding this comment.
One thing I noticed here is that this pattern will match portions of words. For example, when I tried it with a file that had the word Documentation in it, it highlighted the on at the end differently from the rest of the word.
There was a problem hiding this comment.
Good catch, it wrapped it in \b in order to match the whole word and not only parts of it.
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|&\{[^\}]+\}"> | ||
| <token type="NameVariable"/> | ||
| </rule> | ||
| <rule pattern="\b(True|False|None|null|on|off)\b"> |
There was a problem hiding this comment.
I think we're getting closer.
The change to use word boundaries has fixed the issue when the keyword appears at the beginning or in the middle of a word, but there still seems to be an issue when the keyword appears at the end of a word. Here I mean 'word' as a 'space separated' word.
For example, for the following document still has the 'on' at the end of the word Documentation highlighted:
*** Settings ***
Documentation A test suite
I think the problem is that there is an ambiguity in the lexer grammar between the 'keyword' rule and the '.' rule in the root state that the initial word-boundary doesn't solve. I temporarily added a debug log to print the tokens and for that document I see:
syntax: token: token: Type: Keyword Value: '*** Settings ***' Start: 0 End: 16
syntax: token: token: Type: TextWhitespace Value: '
' Start: 16 End: 17
syntax: token: token: Type: Text Value: 'Documentati' Start: 17 End: 28
syntax: token: token: Type: KeywordConstant Value: 'on' Start: 28 End: 30
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 30 End: 31
syntax: token: token: Type: Text Value: 'A' Start: 31 End: 32
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 32 End: 33
syntax: token: token: Type: Text Value: 'test' Start: 33 End: 37
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 37 End: 38
syntax: token: token: Type: Text Value: 'suite' Start: 38 End: 43
syntax: token: token: Type: EOFType Value: '' Start: 0 End: 0
It looks like the grammar '.' rule allows parsing each letter in 'Documentati' as text (which is then combined internally), but when it sees the 'on' it matches the keyword rule. I think the word boundary is somehow honoured because the text being passed to match the regexp is the remainder of the unmatched text, which is 'on A test suite', and so the first word-boundary is satisfied.
Perhaps we must only enter the 'keyword' state in a more specific condition? I don't know the syntax of robot files, but I am assuming that the keywords can only appear in test case sections, and only in a specific place in a line, and maybe only after some other syntax element that introduces it. If so, maybe we only need to enter the keyword state under those conditions.
If you want I can give you the source-code diff to print those logs from Anvil, or give you a custom compiled binary that prints the logs so you can troubleshoot it.
This PR adds support for the Robot Framework test suite https://robotframework.org/