Lexer

parsley.token.Lexer

class Lexer(desc: LexicalDesc, errConfig: ErrorConfig)

This class provides a large selection of functionality concerned with lexing.

This class provides lexing functionality to parsley, however it is guaranteed that nothing in this class is not implementable purely using parsley's pre-existing functionality. These are regular parsers, but constructed in such a way that they create a clear and logical separation from the rest of the parser.

The class is broken up into several internal "modules" that group together similar kinds of functionality. Importantly, the lexemes and nonlexemes objects separate the underlying token implementations based on whether or not they consume whitespace or not. Functionality is broadly duplicated across both of these modules: lexemes should be used by a wider parser, to ensure whitespace is handled uniformly; and nonlexemes should be used to define further composite tokens or in special circumstances where whitespace should not be consumed.

It is possible that some of the implementations of parsers found within this class may have been hand-optimised for performance: care will have been taken to ensure these implementations precisely match the semantics of the originals.

Attributes

Source: Lexer.scala
Graph
Supertypes: class Object

trait Matchable

class Any

Members list

Type members

Classlikes

This object is concerned with lexemes: these are tokens that are treated as "words", such that whitespace will be consumed after each has been parsed.

Ideally, a wider parser should not be concerned with handling whitespace, as it is responsible for dealing with a stream of tokens. With parser combinators, however, it is usually not the case that there is a separate distinction between the parsing phase and the lexing phase. That said, it is good practice to establish a logical separation between the two worlds. As such, this object contains parsers that parse tokens, and these are whitespace-aware. This means that whitespace will be consumed after any of these parsers are parsed. It is not, however, required that whitespace be present.

Attributes

Since: 4.0.0
Source: Lexer.scala
Supertypes: class Object

trait Matchable

class Any
Self type: lexeme.type

This object is concerned with non-lexemes: these are tokens that do not give any special treatment to whitespace.

Whilst the functionality in lexeme is strongly recommended for wider use in a parser, the functionality here may be useful for more specialised use-cases. In particular, these may for the building blocks for more complex tokens (where whitespace is not allowed between them, say), in which case these compound tokens can be turned into lexemes manually. For example, the lexer does not have configuration for trailing specifiers on numeric literals (like, 1024L in Scala, say): the desired numeric literal parser could be extended with this functionality before whitespace is consumed by using the variant found in this object.

Alternatively, these tokens can be used for lexical extraction, which can be performed by the ErrorBuilder typeclass: this can be used to try and extract tokens from the input stream when an error happens, to provide a more informative error. In this case, it is desirable to not consume whitespace after the token to keep the error tight and precise.

Attributes

Since: 4.0.0
Source: Lexer.scala
Supertypes: class Object

trait Matchable

class Any
Self type: nonlexeme.type

This object is concerned with special treatment of whitespace.

For the vast majority of cases, the functionality within this object shouldn't be needed, as whitespace is consistently handled by lexeme and fully. However, for grammars where whitespace is significant (like indentation-sensitive languages), this object provides some more fine-grained control over how whitespace is consumed by the parsers within lexeme.

Attributes

Since: 4.0.0
Source: Lexer.scala
Supertypes: class Object

trait Matchable

class Any
Self type: space.type

Value members

Constructors

Builds a new lexer with a given description for the lexical structure of the language.

Value parameters

desc: the configuration for the lexer, specifying the lexical rules of the grammar/language being parsed.

Attributes

Since: 4.0.0
Source: Lexer.scala

Concrete methods

This combinator ensures a parser fully parses all available input, and consumes whitespace at the start.

This combinator should be used once as the outermost combinator in a parser. It is the only combinator that should consume leading whitespace, and this must be the first thing a parser does. It will ensure that, after the parser is complete, the end of the input stream has been reached.

Attributes

Since: 4.0.0
Source: Lexer.scala

In this article

Generated with