APaGeD includes a regular expression compiler that generates linear-time matchers
.
Here is a short overview of the supported operators:
| alternation
(...) non-matching brackets
(?...) matching brackets (usually not used in lexemes)
? zero or one repetition (greedy)
* zero or more repetitions (greedy)
+ one or more repetitions (greedy)
?? zero or one repetition (reluctant)
*? zero or more repetitions (reluctant)
+? one or more repetitions (reluctant)
{x,y} counted occurrence (greedy). At least {x,y}? counted occurrence (reluctant)
. any character (precisely: 0x09-0x13, 0x20-0x7e, 0xa0-0xff, 0x0100-0x017f, 0x0180-0x024f, 0x20a3-0x20b5)
[...] [^...] character classes
>x negative, single character lookahead (character classes not supported for
The lookahead is a speciality of the APaGeD implementation. It is a lot faster than general lookahead
but less powerful. For many situations it is powerful enough, though.
For example, \*>/ matches any * that is not followed by a /.
The syntax for matching and non-matching brackets has been switched for convenience, since sub-matches are usually not used in lexemes.