Object

colossus.parsing

Combinators

Related Doc: package parsing

Permalink

object Combinators

Streaming Parser Combinators

Overview

A Parser[T] is an object that consumes a stream of bytes to produce a result of type T.

A Combinator is a "higher-order" parser that takes one or more parsers to produce a new parser

The Stream parsers are very fast and efficient, but because of this they need to make some tradeoffs. They are mutable, not thread safe, and in general are designed for network protocols, which tend to have very deterministic grammars.

The Parser Rules:

1. A parser must greedily consume the data stream until it produces a result 2. When a parser consumes the last byte necessary to produce a result, it must stop consuming the stream and return the new result while resetting its state

Examples

Use any parser by itself:

val parser = bytes(4)
val data = DataBuffer(ByteString("aaaabbbbccc")
parser.parse(data) // Some(ByteString(97, 97, 97, 97))
parser.parse(data) >> {bytes => bytes.utf8String} // Some("bbbb")
parser.parse(data) // None

Combine two parsers

val parser = bytes(3) ~ bytes(2) >> {case a ~ b => a.ut8String + ":" + b.utf8String}
parser.parse(DataBuffer(ByteString("abc"))) // None
parser.parse(DataBuffer(ByteString("defgh"))) // Some("abc:de")
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Combinators
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Type Members

  1. implicit final class ByteArrayOps extends AnyVal

    Permalink
  2. class ChainedParser[A, B] extends Parser[~[A, B]]

    Permalink
  3. class FastArrayBuilder extends FastArrayBuilding

    Permalink
  4. trait FastArrayBuilding extends AnyRef

    Permalink

    A very fast dynamically growable array builder.

    A very fast dynamically growable array builder. Do not be tempted to replace this with any out-of-the-box Java/Scala class. This is faster.

  5. class FlatMapParser[A, B] extends Parser[B]

    Permalink
  6. class FoldZeroParser[T, U] extends Parser[U]

    Permalink
  7. class LineParser[T] extends Parser[T] with FastArrayBuilding

    Permalink

    Parse a single line of data.

    Parse a single line of data. A "line" is terminated by \r\n.

    This is quite possibly the fastest line parser in existence. While this is basically a specialized version of the bytesUntil parser, it is significantly faster. Part of the speedup is simply from basically including the functionality of the MapParser, which avoids a bunch of function calls. I believe the rest of the speedup is due to the fact that comparing the next byte to a constant vs an array member is significantly faster. I have made several attempts to get the bytesUntil parser as fast as this one to no avail.

  8. class MapParser[A, B] extends Parser[B]

    Permalink
  9. trait Parser[+T] extends AnyRef

    Permalink
  10. class RepeatZeroParser[T] extends Parser[Array[T]]

    Permalink

    Repeat a parser, accumulating the results until the value returned by the parser matches the type's Zero value.

    Repeat a parser, accumulating the results until the value returned by the parser matches the type's Zero value. This can be used, for example, to keep parsing lines of data until an empty line is encountered

  11. case class ~[+A, +B](a: A, b: B) extends Product with Serializable

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val byte: Parser[Byte]

    Permalink

    parse a single byte

  6. def bytes(num: Int): Parser[Array[Byte]]

    Permalink
  7. def bytes(num: Parser[Int]): Parser[Array[Byte]]

    Permalink
  8. def bytes(num: Int, maxSize: DataSize, maxInitBufferSize: DataSize): Parser[Array[Byte]]

    Permalink
  9. def bytes(num: Parser[Int], maxSize: DataSize, maxInitBufferSize: DataSize): Parser[Array[Byte]]

    Permalink

    read a fixed number bytes, prefixed by a length

  10. def bytesUntil(terminus: Array[Byte], includeTerminusInData: Boolean = false, sizeHint: Int = 32): Parser[Array[Byte]]

    Permalink

    Keep reading bytes until the terminus is encounted.

    Keep reading bytes until the terminus is encounted. This accounts for possible partial terminus in the data. The terminus is NOT included in the returned value

  11. def bytesUntilEOS: Parser[ByteString]

    Permalink

    Read in an unknown number of bytes, ended only when endOfStream is called

    Read in an unknown number of bytes, ended only when endOfStream is called

    be aware this parser has no max size and will read in data forever if endOfStream is never called

  12. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  13. def const[T](t: T): Parser[T]

    Permalink

    Creates a parser that will always return the same value without consuming any data.

    Creates a parser that will always return the same value without consuming any data. Useful when flatMapping parsers

  14. def delimitedString(delimiter: Byte, terminus: Byte): Parser[Vector[String]]

    Permalink

    Parse a series of ascii strings separated by a single-byte delimiter and terminated by a byte

  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def foldZero[T, U](parser: Parser[T], init: ⇒ U)(folder: (T, U) ⇒ U)(implicit zero: Zero[T]): FoldZeroParser[T, U]

    Permalink
  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  21. def int: Parser[Int]

    Permalink
  22. def intUntil(terminus: Byte, base: Int = 10): Parser[Long]

    Permalink

    Parses the ASCII representation of an integer, keeps going until the terminus is encountered

  23. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  24. def line[T](constructor: (Array[Byte]) ⇒ T, includeNewLine: Boolean): Parser[T]

    Permalink
  25. def line(includeNewline: Boolean): Parser[Array[Byte]]

    Permalink
  26. def line: Parser[Array[Byte]]

    Permalink
  27. def literal(lit: ByteString): Parser[ByteString]

    Permalink
  28. def long: Parser[Long]

    Permalink
  29. def maxSize[T](size: DataSize, parser: Parser[T]): Parser[T]

    Permalink

    Creates a parser that wraps another parser and will throw an exception if more than size data is required to parse a single object.

    Creates a parser that wraps another parser and will throw an exception if more than size data is required to parse a single object. See the ParserSizeTracker for more details.

  30. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  31. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. def repeat[T](times: Long, parser: Parser[T]): Parser[Vector[T]]

    Permalink

    Repeat a pattern a fixed number of times

    Repeat a pattern a fixed number of times

    times

    the number of times to parse the pattern

    parser

    the parser for the pattern

    returns

    the parsed sequence

  34. def repeat[T](times: Parser[Long], parser: Parser[T]): Parser[Vector[T]]

    Permalink

    Parse a pattern multiple times based on a numeric prefix

    Parse a pattern multiple times based on a numeric prefix

    This is useful for any situation where the repeated pattern is prefixed by the number of repetitions, for example num:[obj1][obj2][obj3]. In situations where the pattern doesn't immediately follow the number, you'll have to do it yourself, something like

    intUntil(':') ~ otherParser |> {case num ~ other => repeat(num, patternParser)

    }

    intUntil(':') ~ otherParser |> {case num ~ other => repeat(num, patternParser) }}}

    times

    parser for the number of times to repeat the pattern

    parser

    the parser that will parse a single instance of the pattern

    returns

    the parsed sequence

  35. def repeatUntil[T](parser: Parser[T], terminus: Byte): Parser[Vector[T]]

    Permalink

    Repeatedly parse a pattern until a terminal byte is reached

    Repeatedly parse a pattern until a terminal byte is reached

    Before calling parser this will examine the next byte. If the byte matches the terminus, it will return the built sequence. Otherwise it will pass control to parser (including the examined byte) until the parser returns a result.

    Notice that the terminal byte is consumed, so if we have

    val parser = repeatUntil(bytes(2), ':')
    parser.parse(DataBuffer(ByteString("aabbcc:ddee")))

    the bytes remaining in the buffer after parsing are just ddee.

    parser

    the parser repeat

    terminus

    the byte to singal to stop repeating

    returns

    the parsed sequence

  36. def repeatUntilEOS[T](parser: Parser[T]): Parser[Seq[T]]

    Permalink

    Create a parser that will repeat the given parser forever until endOfStream() is called.

    Create a parser that will repeat the given parser forever until endOfStream() is called. The results from each call to the given parser are accumulated and returned at the end of the stream.

  37. def repeatZero[T](parser: Parser[T])(implicit arg0: ClassTag[T], zero: Zero[T]): RepeatZeroParser[T]

    Permalink

    Repeat using a parser until it returns a zero value.

    Repeat using a parser until it returns a zero value. An array of non-zero values is returned

  38. def short: Parser[Short]

    Permalink
  39. def skip[T](n: Int): Parser[Unit]

    Permalink

    creates a parser that will skip over n bytes.

    creates a parser that will skip over n bytes. You generally only want to do this inside a peek parser

  40. def stringUntil(terminus: Byte, toLower: Boolean = false, minSize: Option[Int] = None, allowWhiteSpace: Boolean = true, ltrim: Boolean = false): Parser[String]

    Permalink

    Parse a string until a designated byte is encountered

    Parse a string until a designated byte is encountered

    Limited filtering is currently supported, all of which happens during the reading.

    terminus

    reading will stop when this byte is encountered

    toLower

    if true any characters in the range A-Z will be lowercased before insertion

    minSize

    specify a minimum size

    allowWhiteSpace

    throw a ParseException if any whitespace is encountered before the terminus. If the terminus is a whitespace character, it will not be counted

    ltrim

    trim leading whitespace

  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  43. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped