Lexer
When provided with a LanguageDef
, this class will produce a large variety of parsers that can be used for
tokenisation of a language. This includes parsing numbers and strings in their various formats and ensuring that
all operations consume whitespace after them (so-called lexeme parsers). These are very useful in parsing
programming languages. This class also has a large number of hand-optimised intrinsic parsers to improve performance!
When provided with a LanguageDef
, this class will produce a large variety of parsers that can be used for
tokenisation of a language. This includes parsing numbers and strings in their various formats and ensuring that
all operations consume whitespace after them (so-called lexeme parsers). These are very useful in parsing
programming languages. This class also has a large number of hand-optimised intrinsic parsers to improve performance!
- Value Params
- lang
The rules that govern the language we are tokenising
- Since
2.2.0
Value members
Concrete methods
Lexeme parser angles(p)
parses p
enclosed in angle brackets ('<', '>'), returning the
value of p
.
Lexeme parser angles(p)
parses p
enclosed in angle brackets ('<', '>'), returning the
value of p
.
Lexeme parser braces(p)
parses p
enclosed in braces ('{', '}'), returning the value of 'p'
Lexeme parser braces(p)
parses p
enclosed in braces ('{', '}'), returning the value of 'p'
Lexeme parser brackets(p)
parses p
enclosed in brackets ('[', ']'), returning the value
of p
.
Lexeme parser brackets(p)
parses p
enclosed in brackets ('[', ']'), returning the value
of p
.
Lexeme parser commaSep(p)
parses zero or more occurrences of p
separated by comma
.
Returns a list of values returned by p
.
Lexeme parser commaSep(p)
parses zero or more occurrences of p
separated by comma
.
Returns a list of values returned by p
.
Lexeme parser commaSep1(p)
parses one or more occurrences of p
separated by comma
.
Returns a list of values returned by p
.
Lexeme parser commaSep1(p)
parses one or more occurrences of p
separated by comma
.
Returns a list of values returned by p
.
The lexeme parser keyword(name)
parses the symbol name
, but it also checks that the name
is not a prefix of a valid identifier. A keyword
is treated as a single token using attempt
.
The lexeme parser keyword(name)
parses the symbol name
, but it also checks that the name
is not a prefix of a valid identifier. A keyword
is treated as a single token using attempt
.
lexeme(p)
first applies parser p
and then the whiteSpace
parser, returning the value of
p
. Every lexical token (lexeme) is defined using lexeme
, this way every parse starts at a
point without white space. The only point where the whiteSpace
parser should be called
explicitly is the start of the main parser in order to skip any leading white space.
lexeme(p)
first applies parser p
and then the whiteSpace
parser, returning the value of
p
. Every lexical token (lexeme) is defined using lexeme
, this way every parse starts at a
point without white space. The only point where the whiteSpace
parser should be called
explicitly is the start of the main parser in order to skip any leading white space.
The lexeme parser maxOp(name)
parses the symbol name
, but also checks that the name
is not part of a larger reserved operator. An operator
is treated as a single token using
attempt
.
The lexeme parser maxOp(name)
parses the symbol name
, but also checks that the name
is not part of a larger reserved operator. An operator
is treated as a single token using
attempt
.
The non-lexeme parser maxOp_(name)
parses the symbol name
, but also checks that the name
is not part of a larger reserved operator. An operator
is treated as a single token using
attempt
.
The non-lexeme parser maxOp_(name)
parses the symbol name
, but also checks that the name
is not part of a larger reserved operator. An operator
is treated as a single token using
attempt
.
The lexeme parser operator(name)
parses the symbol name
, but also checks that the name
is not the prefix of a valid operator. An operator
is treated as a single token using
attempt
.
The lexeme parser operator(name)
parses the symbol name
, but also checks that the name
is not the prefix of a valid operator. An operator
is treated as a single token using
attempt
.
The non-lexeme parser operator_(name)
parses the symbol name
, but also checks that the name
is not the prefix of a valid operator. An operator
is treated as a single token using
attempt
.
The non-lexeme parser operator_(name)
parses the symbol name
, but also checks that the name
is not the prefix of a valid operator. An operator
is treated as a single token using
attempt
.
Lexeme parser parens(p)
parses p
enclosed in parenthesis, returning the value of p
.
Lexeme parser parens(p)
parses p
enclosed in parenthesis, returning the value of p
.
Lexeme parser semiSep(p)
parses zero or more occurrences of p
separated by semi
. Returns
a list of values returned by p
.
Lexeme parser semiSep(p)
parses zero or more occurrences of p
separated by semi
. Returns
a list of values returned by p
.
Lexeme parser semiSep1(p)
parses one or more occurrences of p
separated by semi
. Returns
a list of values returned by p
.
Lexeme parser semiSep1(p)
parses one or more occurrences of p
separated by semi
. Returns
a list of values returned by p
.
Lexeme parser symbol(s)
parses string(s)
and skips trailing white space.
Lexeme parser symbol(s)
parses string(s)
and skips trailing white space.
Concrete fields
This lexeme parser parses a single literal character. Returns the literal character value. This parser deals correctly with escape sequences. The literal character is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
This lexeme parser parses a single literal character. Returns the literal character value. This parser deals correctly with escape sequences. The literal character is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
Lexeme parser colon
parses the character ':' and skips any trailing white space. Returns ":"
Lexeme parser colon
parses the character ':' and skips any trailing white space. Returns ":"
Lexeme parser comma
parses the character ',' and skips any trailing white space. Returns ","
Lexeme parser comma
parses the character ',' and skips any trailing white space. Returns ","
Parses a positive whole number in the decimal system. Returns the value of the number.
Parses a positive whole number in the decimal system. Returns the value of the number.
Lexeme parser dot
parses the character '.' and skips any trailing white space. Returns "."
Lexeme parser dot
parses the character '.' and skips any trailing white space. Returns "."
This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report. Accepts an optional '+' or '-' sign.
This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report. Accepts an optional '+' or '-' sign.
Parses a positive whole number in the hexadecimal system. The number should be prefixed with "0x" or "0X". Returns the value of the number.
Parses a positive whole number in the hexadecimal system. The number should be prefixed with "0x" or "0X". Returns the value of the number.
This lexeme parser parses a legal identifier. Returns the identifier string. This parser will
fail on identifiers that are reserved words (i.e. keywords). Legal identifier characters and
keywords are defined in the LanguageDef
provided to the lexer. An identifier is treated
as a single token using attempt
.
This lexeme parser parses a legal identifier. Returns the identifier string. This parser will
fail on identifiers that are reserved words (i.e. keywords). Legal identifier characters and
keywords are defined in the LanguageDef
provided to the lexer. An identifier is treated
as a single token using attempt
.
This lexeme parser parses an integer (a whole number). This parser is like natural
except
that it can be prefixed with a sign (i.e '-' or '+'). Returns the value of the number. The
number can be specified in decimal
, hexadecimal
or octal
. The number is parsed
according to the grammar rules in the haskell report.
This lexeme parser parses an integer (a whole number). This parser is like natural
except
that it can be prefixed with a sign (i.e '-' or '+'). Returns the value of the number. The
number can be specified in decimal
, hexadecimal
or octal
. The number is parsed
according to the grammar rules in the haskell report.
This lexeme parser parses a natural number (a positive whole number). Returns the value of
the number. The number can specified in decimal
, hexadecimal
or octal
. The number is
parsed according to the grammar rules in the Haskell report.
This lexeme parser parses a natural number (a positive whole number). Returns the value of
the number. The number can specified in decimal
, hexadecimal
or octal
. The number is
parsed according to the grammar rules in the Haskell report.
This lexeme parser parses either natural
or unsigned float
. Returns the value of the number. This
parser deals with any overlap in the grammar rules for naturals and floats. The number is
parsed according to the grammar rules defined in the Haskell report.
This lexeme parser parses either natural
or unsigned float
. Returns the value of the number. This
parser deals with any overlap in the grammar rules for naturals and floats. The number is
parsed according to the grammar rules defined in the Haskell report.
This lexeme parser parses either integer
or float
. Returns the value of the number. This
parser deals with any overlap in the grammar rules for naturals and floats. The number is
parsed according to the grammar rules defined in the Haskell report.
This lexeme parser parses either integer
or float
. Returns the value of the number. This
parser deals with any overlap in the grammar rules for naturals and floats. The number is
parsed according to the grammar rules defined in the Haskell report.
Parses a positive whole number in the octal system. The number should be prefixed with "0o" or "0O". Returns the value of the number.
Parses a positive whole number in the octal system. The number should be prefixed with "0o" or "0O". Returns the value of the number.
This non-lexeme parser parses a string in a raw fashion. The escape characters in the string remain untouched. While escaped quotes do not end the string, they remain as " in the result instead of becoming a quote character. Does not support string gaps.
This non-lexeme parser parses a string in a raw fashion. The escape characters in the string remain untouched. While escaped quotes do not end the string, they remain as " in the result instead of becoming a quote character. Does not support string gaps.
This lexeme parser parses a reserved operator. Returns the name of the operator. Legal
operator characters and reserved operators are defined in the LanguageDef
provided
to the lexer. A reservedOp
is treated as a single token using attempt
.
This lexeme parser parses a reserved operator. Returns the name of the operator. Legal
operator characters and reserved operators are defined in the LanguageDef
provided
to the lexer. A reservedOp
is treated as a single token using attempt
.
This non-lexeme parser parses a reserved operator. Returns the name of the operator.
Legal operator characters and reserved operators are defined in the LanguageDef
provided to the lexer. A reservedOp_
is treated as a single token using attempt
.
This non-lexeme parser parses a reserved operator. Returns the name of the operator.
Legal operator characters and reserved operators are defined in the LanguageDef
provided to the lexer. A reservedOp_
is treated as a single token using attempt
.
Lexeme parser semi
parses the character ';' and skips any trailing white space. Returns ";"
Lexeme parser semi
parses the character ';' and skips any trailing white space. Returns ";"
Parses any comments and skips them, this includes both line comments and block comments.
Parses any comments and skips them, this includes both line comments and block comments.
This lexeme parser parses a literal string. Returns the literal string value. This parser deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
This lexeme parser parses a literal string. Returns the literal string value. This parser deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
This non-lexeme parser parses a literal string. Returns the literal string value. This parser deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
This non-lexeme parser parses a literal string. Returns the literal string value. This parser deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).
This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report.
This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report.
This lexeme parser parses a legal operator. Returns the name of the operator. This parser
will fail on any operators that are reserved operators. Legal operator characters and
reserved operators are defined in the LanguageDef
provided to the lexer. A
userOp
is treated as a single token using attempt
.
This lexeme parser parses a legal operator. Returns the name of the operator. This parser
will fail on any operators that are reserved operators. Legal operator characters and
reserved operators are defined in the LanguageDef
provided to the lexer. A
userOp
is treated as a single token using attempt
.
Parses any white space. White space consists of zero or more occurrences of a space
(as
provided by the LanguageDef
), a line comment or a block (multi-line) comment. Block
comments may be nested. How comments are started and ended is defined in the LanguageDef
that is provided to the lexer.
Parses any white space. White space consists of zero or more occurrences of a space
(as
provided by the LanguageDef
), a line comment or a block (multi-line) comment. Block
comments may be nested. How comments are started and ended is defined in the LanguageDef
that is provided to the lexer.
Parses any white space. White space consists of zero or more occurrences of a space
(as
provided by the parameter), a line comment or a block (multi-line) comment. Block
comments may be nested. How comments are started and ended is defined in the LanguageDef
that is provided to the lexer.
Parses any white space. White space consists of zero or more occurrences of a space
(as
provided by the parameter), a line comment or a block (multi-line) comment. Block
comments may be nested. How comments are started and ended is defined in the LanguageDef
that is provided to the lexer.