ra3

package ra3

ra3 provides an embedded query language and its corresponding query engine.

ra3 is built on a distributed task execution library named tasks. Consequently almost all interactions with ra3 needs a handle for a configured tasks runtime environment represented by a value of the type tasks.TaskSystemComponents. You can configure and start the tasks environment with tasks.withTaskSystem method.

ra3 can query data on disk (or in object storage) organized into its own intermediate chunked columnar storage. A table in ra3 is represented by a value of the type ra3.Table. One can import CSV data with the ra3.importCsv method. One can export data back to CSV with the ra3.Table.exportToCsv method. The intermediate data organization is not meant for any use outside of ra3, neither for long term storage.

Each query in ra3 is persisted to secondary storage and checkpointed.

The entry points to the query language are the various methods in the ra3 package or in the ra3.Table class which provide typed references to columns or references to tables, e.g.:

ra3.let and ra3.let0
ra3.Table.in and ra3.Table.in0

The query language builds an expression tree of type ra3.tablelang.TableExpr, which is evaluated with the ra3.tablelang.TableExpr.evaluate into an IO[Table]. ra3.tablelang.TableExpr is a serializable description the query. The expression tree of the query may be printed in human readable form with ra3.tablelang.TableExpr.render.

The following table level operators are available in ra3:

simple query, i.e. element-wise filter and projection. Corresponds to SQL queries of SELECT and WHERE.
count query, i.e. element-wise filter and count. Corresponds to SQL queries of SELECT count(*) and WHERE (but no GROUP BY).
equi-join, i.e. join with a join condition restricted to be equality among two columns.
group by then reduce. Corresponds to SELECT aggregations, WHERE, GROUP BY clauses of SQL
group by then count. Corresponds to SELECT count(*), WHERE, GROUP BY clauses of SQL
approximate selection of the top K number of elements by a single column

Common features of SQL which are not available:

arbitrary join condition (e.g. join by inequalities)
complete products of tables (Cartesian products)
full table sort
sub-query in filter (WHERE in (select id FROM ..))

Partitioning. ra3 does not maintain random access indexes, but it is repartitioning (sharding / bucketizing) the needed columns for a given group by or join operator such that all the keys needed to complete the operation are in the same partition.

Language imports. You may choose to import everything from the ra3 package. It does not contain any implicits.

Attributes

Members list

Packages

Type members

Experimental classlikes

Attributes

Experimental: true
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Enum for predefined column types

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: class F64Column

class I32Column

class I64Column

class InstantColumn

class StrColumn

Attributes

Companion: trait
Experimental: true
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: CSVColumnDefinition.type

Enum for predefined character decoders

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: class ASCII

class ISO88591

class UTF16

class UTF8

Attributes

Companion: trait
Experimental: true
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: CharacterDecoder.type

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object F64

object I32

object I64

object Instant

object StringTag
Self type: ColumnTag

Attributes

Companion: trait
Experimental: true
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: ColumnTag.type

Enum for predefined compression formats

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Gzip

Attributes

Companion: trait
Experimental: true
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: CompressionFormat.type

Attributes

Experimental: true
Supertypes: class Object

trait Matchable

class Any
Self type: Fnv1.type

Enum for predefined formats parsing string to Instant

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object ISO

class LocalDateTimeAtUTC

Attributes

Companion: trait
Experimental: true
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: InstantFormat.type

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Known subtypes: class LocalDateTimeAtUTC

Attributes

Companion: trait
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Self type: InstantParser.type

Attributes

Experimental: true
Supertypes: class Object

trait Matchable

class Any
Self type: Murmur3.type

Scala hack to represent generic types which are not Nothing

This is a type class with two ambiguous instances predefined for Nothing

Attributes

Companion: object
Experimental: true
Supertypes: class Object

trait Matchable

class Any

Attributes

Companion: class
Experimental: true
Supertypes: class Object

trait Matchable

class Any
Self type: NotNothing.type

Reference to a set of aligned columns (i.e. a table) persisted onto secondary storage.

Each table must have a unique identifier, initially given by the importCsv method.

Tables have String column names.

Tables consists of columns. Columns are stored as segments. Segments are the unit of IO operations, i.e. ra3 never reads less then a segment into memory. The in memory (buffered) counterpart of a segment is a Buffer. The maximum number of elements in a segment is thus what is readable into a single java array, that is shortly below 2^31.

Each column of the same table has the same segmentation, i.e. they have the same number of segments and their segments have the same size and those segments are aligned.

Segments store segment level statistics and some operations complete withour buffering the segment.

Attributes

Companion: object
Experimental: true
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Experimental: true
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: Table.type

Attributes

Experimental: true
Supertypes: class Object

trait Matchable

class Any
Self type: csv.type

Types

Value members

Experimental methods

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Concatenate the list of rows of multiple tables ('grows downwards')

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Count query consisting of elementwise (row-wise) filter and counting those rows which pass the filter

Attributes

Experimental: true

Attributes

Experimental: true

Import CSV data into ra3

Value parameters

columns: Description of columns: at a minimum the 0-based column index in the csv file and the type of the column
maxSegmentLength: Each column will be chunked to this length
name: Name of the table to create, must be unique

Attributes

Experimental: true

Import CSV data into ra3

Value parameters

columns: Description of columns: at a minimum the 0-based column index in the csv file and the type of the column
maxSegmentLength: Each column will be chunked to this length
name: Name of the table to create, must be unique

Attributes

Experimental: true

Attributes

Experimental: true

Partial reduction

Reduces each segment independently. Returns a single row per segment.

Attributes

Experimental: true

Simple query consisting of elementwise (row-wise) projection and filter

Attributes

Experimental: true

Full table reduction

Equivalent to a group by into a single group, then reducing that single group. Returns a single row.

This will read all rows of the needed columns into memory. You may want to consult with partialReduce if the reduction is distributable.

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Simple query consisting of elementwise (row-wise) projection and filter

Attributes

Experimental: true

Elementwise or group wise projection

Attributes

Experimental: true

Attributes

Experimental: true

Experimental fields

The value which encodes a missing string. It is the string of length 1, with content of the \u0001 character.

Attributes

Experimental: true

Implicits

Experimental implicits

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

Attributes

Experimental: true

In this article

Generated with