ra3-core

Members list

Packages

package ra3

ra3 provides an embedded query language and its corresponding query engine.

ra3 provides an embedded query language and its corresponding query engine.

ra3 is built on a distributed task execution library named tasks. Consequently almost all interactions with ra3 need a handle for a configured runtime environment represented by a value of the type tasks.TaskSystemComponents. You can configure and start the tasks environment with tasks.withTaskSystem method.

ra3 can query data on disk (or in object storage) organized into its own intermediate chunked columnar storage. A table in ra3 is represented by a value of the type ra3.Table. One can import CSV data with the ra3.importCsv method. One can export data back to CSV with the ra3.Table.exportToCsv method. The intermediate data organization is not meant for any use outside of ra3, neither for long term storage.

Each query in ra3 is persisted to secondary storage and checkpointed.

The entry points to the query language are the various methods in the ra3 package or in the ra3.tablelang.TableExpr class which provide typed references to columns or references to tables, e.g.:

  • ra3.tablelang.TableExpr.scheam

The query language builds an expression tree of type ra3.tablelang.TableExpr, which is evaluated with the ra3.tablelang.TableExpr.evaluate into an IO[Table]. ra3.tablelang.TableExpr is a description the query. The expression tree of the query may be printed in human readable form with ra3.tablelang.TableExpr.render.

The following table level operators are available in ra3:

  • simple query, i.e. element-wise filter and projection. Corresponds to SQL queries of SELECT and WHERE.
  • count query, i.e. element-wise filter and count. Corresponds to SQL queries of SELECT count(*) and WHERE (but no GROUP BY).
  • equi-join, i.e. join with a join condition restricted to be equality among two columns.
  • group by then reduce. Corresponds to SELECT aggregations, WHERE, GROUP BY clauses of SQL
  • group by then count. Corresponds to SELECT count(*), WHERE, GROUP BY clauses of SQL
  • approximate selection of the top K number of elements by a single column

Common features of SQL which are not available:

  • arbitrary join condition (e.g. join by inequalities)
  • complete products of tables (Cartesian products)
  • full table sort
  • sub-query in filter (WHERE in (select id FROM ..))

Partitioning. ra3 does not maintain random access indexes, but it is repartitioning (sharding / bucketizing / shuffling) the needed columns for a given group by or join operator such that all the keys needed to complete the operation are in the same partition.

Language imports. You may choose to import everything from the ra3 package. It does not contain any implicits.

Attributes

In this article