RelationalAlgebra

trait RelationalAlgebra extends AnyRef

Self Type: Table

Linear Supertypes

AnyRef, Any

Known Subclasses

Table

Ordering

Alphabetic
By Inheritance

Inherited

RelationalAlgebra
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
def addColOfSameSegmentation(c: Column, colName: String)(implicit tsc: TaskSystemComponents): IO[Table]
Concat list of columns
final def asInstanceOf[T0]: T0
Definition Classes
Any
def clone(): AnyRef
Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
def concatenate(others: Table*)(implicit tsc: TaskSystemComponents): IO[Table]
This is almost noop, concat the list of segments
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
def equijoin(other: Table, joinColumnSelf: Int, joinColumnOther: Int, how: String, partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(query: (TableReference, TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
- Partition both tables by join column
- For each partition of both input tables
- Buffer the partition completely (all segments, all columns)
- Join buffered tables in memory, use saddle's Index?
- concat joined partitions
def equijoinMultiple(joinColumnSelf: Int, others: Seq[(Table, Int, String, Int)], partitionBase: Int, partitionLimit: Int)(query: (Seq[TableReference]) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
def exportToCsv(columnSeparator: Char = ',', quoteChar: Char = '"', recordSeparator: String = "\r\n", compression: Option[CompressionFormat] = Some(ExportCsv.Gzip))(implicit tsc: TaskSystemComponents): IO[List[SharedFile]]
def filterColumnNames(nameSuffix: String)(p: (String) => Boolean): Table
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def groupBy(cols: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[GroupedTable]
Group by which return group locations
Group by which return group locations
Returns a triple for each input segment: group map, number of groups, group sizes
def groupBySegments(cols: Seq[Int])(implicit tsc: TaskSystemComponents): IO[GroupedTable]
Group by without partitioning
Group by without partitioning
Useful to reduce the segments without partitioning
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
def partition(columnIdx: Seq[Int], partitionBase: Int, numPartitionsIsImportant: Boolean, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Vector[PartitionedTable]]
def pivot(columnGroupRows: Int, columnGroupColumns: Int, valueColumn: Int): IO[Table]
Pivot is two nested group by followed by aggregation and rearranging the results into a new table
Pivot is two nested group by followed by aggregation and rearranging the results into a new table
- Get all distinct elements of columnGroupColumns. Use group by for this. This is the new list of columns.
- Partition by columnGroupRows
- Buffer all three columns of a partition, and pivot it in mem. Use the list of columns, place nulls if needed.
- Concatenate
def prePartition(columnIdx: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Table]
def query(query: (TableReference) => Query)(implicit tsc: TaskSystemComponents): IO[Table]
def reduceTable(query: (TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
def rfilter(predicate: Column)(implicit tsc: TaskSystemComponents): IO[Table]
Variant which takes BufferedTable => BufferInt
- Align predicate segment with table segmentation
- For each aligned predicate segment, buffer it
- For each column
- For each segment in the column
- Buffer column segment
- Apply buffered predicate segment to buffered column segment
- Write applied buffer to local segment
- Resegment
Variant which takes BufferedTable => BufferInt
def rfilterInEquality(columnIdx: Int, cutoff: Segment, lessThan: Boolean)(implicit tsc: TaskSystemComponents): IO[Table]
def selectColumns(columnIndexes: Int*)(implicit tsc: TaskSystemComponents): IO[Table]
This is almost noop, select columns
def sort(sortColumn: Int, ascending: Boolean): IO[Table]
\== Sorting
\== Sorting
We sort by parallel distributed sort
We sort only on 1 colum
- We need an estimate of the CDF (see doc of other method)
- From the approximate CDF we select n values which partition the data evenly into n+1 partitions
- We write those partitions (all columns) - Sort the partitions (all columns)
- Rearrange the sorted partitions in the correct order
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def take(indexes: Int32Column)(implicit tsc: TaskSystemComponents): IO[Table]
- For each aligned index segment, buffer it
- For each column
- For each segment in the column
- Buffer column segment
- Apply buffered predicate segment to buffered column segment
- Write applied buffer to segment and upload
indexes
for each segment
def toString(): String
Definition Classes
AnyRef → Any
def topK(sortColumn: Int, ascending: Boolean, k: Int, cdfCoverage: Double, cdfNumberOfSamplesPerSegment: Int)(implicit tsc: TaskSystemComponents): IO[Table]
\= Top K selection
\= Top K selection
- We need an estimate of the CDF
- From the approximate CDF we select the V value below which K elements fall
- Scan all segments and find the index set which picks those elements below V . TakeIndex on all columns
- Rearrange into table
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

def finalize(): Unit
Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable]) @Deprecated
Deprecated
(Since version 9)

Packages

RelationalAlgebra

trait RelationalAlgebra extends AnyRef

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

RelationalAlgebra

trait RelationalAlgebra extends AnyRef

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

RelationalAlgebra