kantan.parsers

Type members

Classlikes

trait AsTokens[Source, Token]

Type class that describes the capacity to tokenize something.

Type class that describes the capacity to tokenize something.

A tokenized value is an indexed sequence (ideally an array). This allows the parser to navigate input not by consuming it bit by bit and constructing tons of intermediate representations, but as a pointer in something array-ish.

Strings, for example, can be turned into an array of characters at very little cost.

Companion:
object
object AsTokens
Companion:
class
case class Message(offset: Int, pos: Position, input: String, expected: List[String])

Parser error message.

Parser error message.

An error message contains:

  • the index of the token at which the error was encountered.
  • the position (line and column) at which the error was encountered.
  • the token that cause the failure, as a string.
  • a list of the values that were expected.
Companion:
object
object Message
Companion:
class
case class Parsed[+A](value: A, start: Position, end: Position)

Parsed value, equiped with its start and end position in the original source code.

Parsed value, equiped with its start and end position in the original source code.

trait Parser[Token, +A]

Parses a sequence of Token into an A.

Parses a sequence of Token into an A.

The companion object provides standard parsers from which to start building larger ones. In particular, it contains the necessary tools to start writing a string parser, such as char and string.

In order to provide better error messages, developers are encouraged to use label to describe the kind of thing a parser will produce - a digit, for example, or an array, or...

An important thing to realise is that parsers are non-backtracking by default. See the | documentation for detailed information on the consequences of this design choice.

Companion:
object
object Parser
Companion:
class
case class Position(line: Int, column: Int)

Represents a position in a source file.

Represents a position in a source file.

This is supposed to work in conjunction with SourceMap, to allow a parser to automatically keep track of where in a source file a token was encountered.

Companion:
object
object Position
Companion:
class
enum Result[Token, +A]

Result of a parsing operation.

Result of a parsing operation.

This is essentially a very specialised version of Either (and can, in fact, be turned into one through toEither).

A result keeps track of whether or not any data has been consumed when producing it. This is used to decide whether or not to try alternative parsers in a Parser.| call.

Results also store an error message even if they're successful. This might seem a little odd, but is necessary to be able to provide good error messages for combinators such as Parser.filter where we might turn a success into a failure after the fact.

Companion:
object
object Result
Companion:
class
trait SourceMap[Token]

Type class used to keep track of a a token's position in a source file.

Type class used to keep track of a a token's position in a source file.

A source map knows how to compute the following, given the current position in the input:

  • where the token starts.
  • where the token ends.

In the case of characters, for example, the mapping is fairly straightforward. A character:

  • starts at the current position.
  • ends at the beginning of the following line if the character is a line break.
  • ends at the next column otherwise.

One might imagine more complex scenarios, however. Typically, when splitting tokenization and parsing, you'll end up working with tokens that know their position in the original source code.

Companion:
object
object SourceMap
Companion:
class
case class State[Token](input: IndexedSeq[Token], offset: Int, pos: Position)(using evidence$1: SourceMap[Token])

State of a parser.

State of a parser.

A parser works with:

  • an array of tokens to explore (typically the characters that compose a string).
  • an offset in that array that represents how far we've parsed already.
  • the position of the last parsed token.

TODO: before writing documentation, we need to keep track of a token's START and END position. It makes things far easier to explain. With chars, a token's start position is always the previous token's end position. With more complex tokens, this might not hold - think of space-separated ints: "1 2". You cannot guess the start position of '2' just from '1': this doesn't tell you how many spaces there are before the next token starts.

Companion:
object
object State
Companion:
class