ra3

package ra3

ra3 provides an embedded query language and its corresponding query engine.

ra3 is built on a distributed task execution library named tasks. Consequently almost all interactions with ra3 needs a handle for a configured tasks runtime environment represented by a value of the type tasks.TaskSystemComponents. You can configure and start the tasks environment with tasks.withTaskSystem method.

ra3 can query data on disk (or in object storage) organized into its own intermediate chunked columnar storage. A table in ra3 is represented by a value of the type ra3.Table. One can import CSV data with the ra3.importCsv method. One can export data back to CSV with the ra3.Table.exportToCsv method. The intermediate data organization is not meant for any use outside of ra3, neither for long term storage.

Each query in ra3 is persisted to secondary storage and checkpointed.

The entry points to the query language are the various methods in the ra3 package or in the ra3.Table class which provide typed references to columns or references to tables, e.g.:

  • ra3.let and ra3.let0
  • ra3.Table.in and ra3.Table.in0

The query language builds an expression tree of type ra3.tablelang.TableExpr, which is evaluated with the ra3.tablelang.TableExpr.evaluate into an IO[Table]. ra3.tablelang.TableExpr is a serializable description the query. The expression tree of the query may be printed in human readable form with ra3.tablelang.TableExpr.render.

The following table level operators are available in ra3:

  • simple query, i.e. element-wise filter and projection. Corresponds to SQL queries of SELECT and WHERE.
  • count query, i.e. element-wise filter and count. Corresponds to SQL queries of SELECT count(*) and WHERE (but no GROUP BY).
  • equi-join, i.e. join with a join condition restricted to be equality among two columns.
  • group by then reduce. Corresponds to SELECT aggregations, WHERE, GROUP BY clauses of SQL
  • group by then count. Corresponds to SELECT count(*), WHERE, GROUP BY clauses of SQL
  • approximate selection of the top K number of elements by a single column

Common features of SQL which are not available:

  • arbitrary join condition (e.g. join by inequalities)
  • complete products of tables (Cartesian products)
  • full table sort
  • sub-query in filter (WHERE in (select id FROM ..))

Partitioning. ra3 does not maintain random access indexes, but it is repartitioning (sharding / bucketizing) the needed columns for a given group by or join operator such that all the keys needed to complete the operation are in the same partition.

Language imports. You may choose to import everything from the ra3 package. It does not contain any implicits.

Attributes

Members list

Packages

package ra3.bufferimpl
package ra3.hashtable
package ra3.join
package ra3.lang
package ra3.tablelang

Type members

Experimental classlikes

case class BufferedTable(columns: Vector[Buffer], colNames: Vector[String])

Attributes

Experimental
true
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
sealed trait CSVColumnDefinition

Enum for predefined column types

Enum for predefined column types

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
class F64Column
class I32Column
class I64Column
class StrColumn

Attributes

Companion
trait
Experimental
true
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait CharacterDecoder

Enum for predefined character decoders

Enum for predefined character decoders

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
class ASCII
class ISO88591
class UTF16
class UTF8

Attributes

Companion
trait
Experimental
true
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait ColumnTag

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object F64
object I32
object I64
object Instant
object StringTag
Self type
object ColumnTag

Attributes

Companion
trait
Experimental
true
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
ColumnTag.type
sealed trait CompressionFormat

Enum for predefined compression formats

Enum for predefined compression formats

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object Gzip

Attributes

Companion
trait
Experimental
true
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
object Fnv1

Attributes

Experimental
true
Supertypes
class Object
trait Matchable
class Any
Self type
Fnv1.type
sealed trait InstantFormat

Enum for predefined formats parsing string to Instant

Enum for predefined formats parsing string to Instant

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object ISO
object InstantFormat

Attributes

Companion
trait
Experimental
true
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object InstantParser

Attributes

Companion
trait
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Self type
object Murmur3

Attributes

Experimental
true
Supertypes
class Object
trait Matchable
class Any
Self type
Murmur3.type
final class NotNothing[T]

Scala hack to represent generic types which are not Nothing

Scala hack to represent generic types which are not Nothing

This is a type class with two ambiguous instances predefined for Nothing

Attributes

Companion
object
Experimental
true
Supertypes
class Object
trait Matchable
class Any
object NotNothing

Attributes

Companion
class
Experimental
true
Supertypes
class Object
trait Matchable
class Any
Self type
NotNothing.type
case class Table(columns: Vector[TaggedColumn], colNames: Vector[String], uniqueId: String, partitions: Option[PartitionData])

Reference to a set of aligned columns (i.e. a table) persisted onto secondary storage.

Reference to a set of aligned columns (i.e. a table) persisted onto secondary storage.

Each table must have a unique identifier, initially given by the importCsv method.

Tables have String column names.

Tables consists of columns. Columns are stored as segments. Segments are the unit of IO operations, i.e. ra3 never reads less then a segment into memory. The in memory (buffered) counterpart of a segment is a Buffer. The maximum number of elements in a segment is thus what is readable into a single java array, that is shortly below 2^31.

Each column of the same table has the same segmentation, i.e. they have the same number of segments and their segments have the same size and those segments are aligned.

Segments store segment level statistics and some operations complete withour buffering the segment.

Attributes

Companion
object
Experimental
true
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object Table

Attributes

Companion
class
Experimental
true
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
Table.type
object csv

Attributes

Experimental
true
Supertypes
class Object
trait Matchable
class Any
Self type
csv.type

Types

type ColumnSpecExpr[T <: Primitives] = Expr[ColumnSpec[T]]
type CsvColumnDefToColumnType[T] = T match { case I32Column => I32Var case I64Column => I64Var case F64Column => F64Var case StrColumn => StrVar case InstantColumn => InstVar }
type F64Var = DF64
type I32Var = DI32
type I64Var = DI64
type InstVar = DInst
type Primitives = DI32 | DStr | DInst | DF64 | DI64 | String | Int | Long | Double | String | Instant
type StrVar = DStr
type TableExpr[R] = TableExpr[R]

Value members

Experimental methods

def LitF64S(s: Set[Double]): Expr[Set[Double]]

Attributes

Experimental
true
def LitI32S(s: Set[Int]): Expr[Set[Int]]

Attributes

Experimental
true
def LitI64S(s: Set[Long]): Expr[Set[Long]]

Attributes

Experimental
true
def LitInstS(s: Set[Instant]): Expr[Set[Instant]]

Attributes

Experimental
true
def LitStringS(s: Set[String]): Expr[Set[String]]

Attributes

Experimental
true
def S: Expr[ReturnValueTuple[EmptyTuple]]

Attributes

Experimental
true
def concatenate(others: Table*)(implicit tsc: TaskSystemComponents): IO[Table]

Concatenate the list of rows of multiple tables ('grows downwards')

Concatenate the list of rows of multiple tables ('grows downwards')

Attributes

Experimental
true
def const(s: Long): Expr[Long]

Attributes

Experimental
true
def const(s: Int): Expr[Int]

Attributes

Experimental
true
def const(s: String): Expr[String]

Attributes

Experimental
true
def const(s: Double): Expr[Double]

Attributes

Experimental
true
def const(s: Instant): Expr[Instant]

Attributes

Experimental
true
def count[T <: Tuple](prg: Expr[ReturnValueTuple[T]]): SimpleQueryCount[Any, T, ReturnValueTuple[T]]

Count query consisting of elementwise (row-wise) filter and counting those rows which pass the filter

Count query consisting of elementwise (row-wise) filter and counting those rows which pass the filter

Attributes

Experimental
true
def filter(arg0: I32ColumnExpr): Expr[ReturnValueTuple[EmptyTuple.type]]

Attributes

Experimental
true
inline def importCsv[T <: Tuple](file: SharedFile, name: String, columns: T, maxSegmentLength: Int, files: Seq[SharedFile], compression: Option[CompressionFormat], recordSeparator: String, fieldSeparator: Char, header: Boolean, maxLines: Long, bufferSize: Int, characterDecoder: CharacterDecoder)(implicit tsc: TaskSystemComponents): IO[Const[ReturnValueTuple[Map[T, CsvColumnDefToColumnType]]]]

Import CSV data into ra3

Import CSV data into ra3

Value parameters

columns

Description of columns: at a minimum the 0-based column index in the csv file and the type of the column

maxSegmentLength

Each column will be chunked to this length

name

Name of the table to create, must be unique

Attributes

Experimental
true
def importCsvUntyped(file: SharedFile, name: String, columns: Seq[CSVColumnDefinition], maxSegmentLength: Int, files: Seq[SharedFile], compression: Option[CompressionFormat], recordSeparator: String, fieldSeparator: Char, header: Boolean, maxLines: Long, bufferSize: Int, characterDecoder: CharacterDecoder, parallelism: Int)(implicit tsc: TaskSystemComponents): IO[Table]

Import CSV data into ra3

Import CSV data into ra3

Value parameters

columns

Description of columns: at a minimum the 0-based column index in the csv file and the type of the column

maxSegmentLength

Each column will be chunked to this length

name

Name of the table to create, must be unique

Attributes

Experimental
true
inline def importFromStream[T <: Tuple : ClassTag](stream: Stream[IO, T], uniqueId: String, minimumSegmentSize: Int, maximumSegmentSize: Int)(implicit evidence$1: ClassTag[T], tsc: TaskSystemComponents): IO[Const[ReturnValueTuple[Map[T, DefToColumnType]]]]

Attributes

Experimental
true

Partial reduction

Partial reduction

Reduces each segment independently. Returns a single row per segment.

Attributes

Experimental
true
def query[T <: Tuple](prg: Expr[ReturnValueTuple[T]]): SimpleQuery[Any, T, ReturnValueTuple[T]]

Simple query consisting of elementwise (row-wise) projection and filter

Simple query consisting of elementwise (row-wise) projection and filter

Attributes

Experimental
true
def reduce[T <: Tuple](prg: Expr[ReturnValueTuple[T]]): ReduceTable[Any, T, ReturnValueTuple[T]]

Full table reduction

Full table reduction

Equivalent to a group by into a single group, then reducing that single group. Returns a single row.

This will read all rows of the needed columns into memory. You may want to consult with partialReduce if the reduction is distributable.

Attributes

Experimental
true
def render[T](q: TableExpr[T]): String

Attributes

Experimental
true
def select[T1 <: Tuple](arg1: Schema[T1]): Expr[ReturnValueTuple[T1]]

Attributes

Experimental
true
def select[T <: Tuple](prg: Expr[ReturnValueTuple[T]]): SimpleQuery[Any, T, ReturnValueTuple[T]]

Simple query consisting of elementwise (row-wise) projection and filter

Simple query consisting of elementwise (row-wise) projection and filter

Attributes

Experimental
true
def select0: Expr[ReturnValueTuple[EmptyTuple]]

Elementwise or group wise projection

Elementwise or group wise projection

Attributes

Experimental
true
def where(arg0: I32ColumnExpr): Expr[ReturnValueTuple[EmptyTuple.type]]

Attributes

Experimental
true

Experimental fields

val MissingString: String

The value which encodes a missing string. It is the string of length 1, with content of the \u0001 character.

The value which encodes a missing string. It is the string of length 1, with content of the \u0001 character.

Attributes

Experimental
true

Implicits

Experimental implicits

implicit def conversionDF64(a: Expr[DF64]): ColumnSpecExpr[DF64]

Attributes

Experimental
true
implicit def conversionDI32(a: Expr[DI32]): ColumnSpecExpr[DI32]

Attributes

Experimental
true
implicit def conversionDI64(a: Expr[DI64]): ColumnSpecExpr[DI64]

Attributes

Experimental
true
implicit def conversionDInst(a: Expr[DInst]): ColumnSpecExpr[DInst]

Attributes

Experimental
true
implicit def conversionStr(a: Expr[DStr]): ColumnSpecExpr[DStr]

Attributes

Experimental
true