As described before, the parse function can be instructed to create detailed error messages.
They look like this:
Error: src\parser.apd(27): found ";", expected DCodeBlock
input: ;\n{\n class LexemeSet\n {\n Set!(
lookahead: ;
lexeme: ;
AST node stack:
Prolog
LR stack:
---- State 2 (1) ----
Start -> Prolog . Grammar
Prolog -> Prolog . import FQIdent ;
Prolog -> Prolog . APDProperties \{ Props \}
Prolog -> Prolog . APDLexemes \{ Lexs \}
Prolog -> Prolog . APDDeclaration DCodeBlock
---- State 39 (27) ----
Prolog -> Prolog APDDeclaration . DCodeBlock
The first line is the standard error message. The remaining lines are detail.
input prints the portion of the input, that the parser is currently looking at.
lookahead shows current lookahead match.
lexeme is the lexeme that matched the lookahead.
The following lists the current node stack. In the example only a Prolog non-terminal has been reduced, so far.
Finally, the LR stack is pretty printed using the rules the LR states correspond to. An LR parser tries to considers multiple rules at once and narrows down the choice as it reads terminal symbols from the input. A dot in the rule description indicates where, within a rule, the parser is in that state. In the example you can see, that in state 2, the parser has successfully read a Prolog. In the next state (state 39), it has successfully read APDDeclaration, and is expecting a DCodeBlock, but found a semicolon.
In the input you can see, that there is a semicolon in front of the curly brace that opens a DCodeBlock. The error is obvious.
The number in paranthesis after the state index is the line number the parser was in, when it was in that state. The line number of the last - the current - state is therefore always the same as in the normal error message.
With this detail information you can very precisely track down parser errors, as you can see the path the parser took to get into the error situation. You can see all alternatives (the different rules in each state) the parser had on it's way there, but no more.
The LR stack will be short, if you use left recursion as much as possible
. That helps to make debugging the grammar (or the input) more effective.