Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions lexers/embedded/robot.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
<lexer>
<config>
<name>Robot Framework</name>
<alias>robot</alias>
<filename>*.robot</filename>
<mime_type>text/x-robotframework</mime_type>
</config>
<rules>
<state name="root">
<rule>
<include state="whitespace"/>
</rule>
<rule pattern="^\*{3}\s*Settings\s*\*{3}">
<token type="Keyword"/>
</rule>
<rule pattern="^\*{3}\s*Test Cases\s*\*{3}">
<token type="Keyword"/>
</rule>
<rule pattern="^\*{3}\s*Variables\s*\*{3}">
<token type="Keyword"/>
</rule>
<rule pattern="^\*{3}\s*Keywords\s*\*{3}">
<token type="Keyword"/>
</rule>
<rule pattern="\[Documentation\]|\[Tags\]|\[Setup\]|\[Teardown\]|\[Template\]|\[Timeout\]">
<token type="Name"/>
</rule>
<rule pattern="#.*$">
<token type="Comment"/>
</rule>
<rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|&amp;\{[^\}]+\}">
<token type="NameVariable"/>
</rule>
<rule pattern="^\s+\[.*\]">
<token type="LiteralString"/>
</rule>
<rule pattern="\|\s*">
<token type="Punctuation"/>
</rule>
<rule>
<include state="keyword"/>
</rule>
<rule pattern=".">
<token type="Text"/>
</rule>
</state>

<state name="keyword">
<rule pattern="^\s+\w+(\s+)+">
<token type="NameFunction"/>
</rule>
<rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|&amp;\{[^\}]+\}">
<token type="NameVariable"/>
</rule>
<rule pattern="\b(True|False|None|null|on|off)\b">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting closer.

The change to use word boundaries has fixed the issue when the keyword appears at the beginning or in the middle of a word, but there still seems to be an issue when the keyword appears at the end of a word. Here I mean 'word' as a 'space separated' word.

For example, for the following document still has the 'on' at the end of the word Documentation highlighted:

*** Settings ***
Documentation A test suite

I think the problem is that there is an ambiguity in the lexer grammar between the 'keyword' rule and the '.' rule in the root state that the initial word-boundary doesn't solve. I temporarily added a debug log to print the tokens and for that document I see:

syntax: token: token: Type: Keyword Value: '*** Settings ***' Start: 0 End: 16
syntax: token: token: Type: TextWhitespace Value: '
' Start: 16 End: 17
syntax: token: token: Type: Text Value: 'Documentati' Start: 17 End: 28
syntax: token: token: Type: KeywordConstant Value: 'on' Start: 28 End: 30
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 30 End: 31
syntax: token: token: Type: Text Value: 'A' Start: 31 End: 32
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 32 End: 33
syntax: token: token: Type: Text Value: 'test' Start: 33 End: 37
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 37 End: 38
syntax: token: token: Type: Text Value: 'suite' Start: 38 End: 43
syntax: token: token: Type: EOFType Value: '' Start: 0 End: 0

It looks like the grammar '.' rule allows parsing each letter in 'Documentati' as text (which is then combined internally), but when it sees the 'on' it matches the keyword rule. I think the word boundary is somehow honoured because the text being passed to match the regexp is the remainder of the unmatched text, which is 'on A test suite', and so the first word-boundary is satisfied.

Perhaps we must only enter the 'keyword' state in a more specific condition? I don't know the syntax of robot files, but I am assuming that the keywords can only appear in test case sections, and only in a specific place in a line, and maybe only after some other syntax element that introduces it. If so, maybe we only need to enter the keyword state under those conditions.

If you want I can give you the source-code diff to print those logs from Anvil, or give you a custom compiled binary that prints the logs so you can troubleshoot it.

<token type="KeywordConstant"/>
</rule>
<rule pattern="&#34;(?:\\.|[^&#34;])*&#34;">
<token type="LiteralStringDouble"/>
</rule>
<rule pattern="&#39;(?:\\.|[^&#39;])*&#39;">
<token type="LiteralStringSingle"/>
</rule>
<rule pattern="\d+\.\d+|\d+">
<token type="LiteralNumber"/>
</rule>
<rule pattern="[=\-\+\*/%]">
<token type="Operator"/>
</rule>
</state>

<state name="whitespace">
<rule pattern="\s+">
<token type="TextWhitespace"/>
</rule>
<rule pattern="\n+">
<token type="TextWhitespace"/>
</rule>
</state>
</rules>
</lexer>