trait RelationalAlgebra extends AnyRef
- Alphabetic
- By Inheritance
- RelationalAlgebra
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def addColOfSameSegmentation(c: Column, colName: String)(implicit tsc: TaskSystemComponents): IO[Table]
Concat list of columns
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def concatenate(others: Table*)(implicit tsc: TaskSystemComponents): IO[Table]
This is almost noop, concat the list of segments
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def equijoin(other: Table, joinColumnSelf: Int, joinColumnOther: Int, how: String, partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(query: (TableReference, TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
- Partition both tables by join column
- For each partition of both input tables
- Buffer the partition completely (all segments, all columns)
- Join buffered tables in memory, use saddle's Index?
- concat joined partitions
- def equijoinMultiple(joinColumnSelf: Int, others: Seq[(Table, Int, String, Int)], partitionBase: Int, partitionLimit: Int)(query: (Seq[TableReference]) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
- def exportToCsv(columnSeparator: Char = ',', quoteChar: Char = '"', recordSeparator: String = "\r\n", compression: Option[CompressionFormat] = Some(ExportCsv.Gzip))(implicit tsc: TaskSystemComponents): IO[List[SharedFile]]
- def filterColumnNames(nameSuffix: String)(p: (String) => Boolean): Table
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def groupBy(cols: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[GroupedTable]
Group by which return group locations
Group by which return group locations
Returns a triple for each input segment: group map, number of groups, group sizes
- def groupBySegments(cols: Seq[Int])(implicit tsc: TaskSystemComponents): IO[GroupedTable]
Group by without partitioning
Group by without partitioning
Useful to reduce the segments without partitioning
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def partition(columnIdx: Seq[Int], partitionBase: Int, numPartitionsIsImportant: Boolean, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Vector[PartitionedTable]]
- def pivot(columnGroupRows: Int, columnGroupColumns: Int, valueColumn: Int): IO[Table]
Pivot is two nested group by followed by aggregation and rearranging the results into a new table
Pivot is two nested group by followed by aggregation and rearranging the results into a new table
- Get all distinct elements of
columnGroupColumns
. Use group by for this. This is the new list of columns. - Partition by columnGroupRows
- Buffer all three columns of a partition, and pivot it in mem. Use the list of columns, place nulls if needed.
- Concatenate
- Get all distinct elements of
- def prePartition(columnIdx: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Table]
- def query(query: (TableReference) => Query)(implicit tsc: TaskSystemComponents): IO[Table]
- def reduceTable(query: (TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
- def rfilter(predicate: Column)(implicit tsc: TaskSystemComponents): IO[Table]
Variant which takes BufferedTable => BufferInt
- Align predicate segment with table segmentation
- For each aligned predicate segment, buffer it
- For each column
- For each segment in the column
- Buffer column segment
- Apply buffered predicate segment to buffered column segment
- Write applied buffer to local segment
- Resegment
Variant which takes BufferedTable => BufferInt
- def rfilterInEquality(columnIdx: Int, cutoff: Segment, lessThan: Boolean)(implicit tsc: TaskSystemComponents): IO[Table]
- def selectColumns(columnIndexes: Int*)(implicit tsc: TaskSystemComponents): IO[Table]
This is almost noop, select columns
- def sort(sortColumn: Int, ascending: Boolean): IO[Table]
\== Sorting
\== Sorting
We sort by parallel distributed sort
We sort only on 1 colum
- We need an estimate of the CDF (see doc of other method)
- From the approximate CDF we select n values which partition the data evenly into n+1 partitions
- We write those partitions (all columns) - Sort the partitions (all columns)
- Rearrange the sorted partitions in the correct order
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def take(indexes: Int32Column)(implicit tsc: TaskSystemComponents): IO[Table]
- For each aligned index segment, buffer it
- For each column
- For each segment in the column
- Buffer column segment
- Apply buffered predicate segment to buffered column segment
- Write applied buffer to segment and upload
- indexes
for each segment
- def toString(): String
- Definition Classes
- AnyRef → Any
- def topK(sortColumn: Int, ascending: Boolean, k: Int, cdfCoverage: Double, cdfNumberOfSamplesPerSegment: Int)(implicit tsc: TaskSystemComponents): IO[Table]
\= Top K selection
\= Top K selection
- We need an estimate of the CDF
- From the approximate CDF we select the V value below which K elements fall
- Scan all segments and find the index set which picks those elements below V . TakeIndex on all columns
- Rearrange into table
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)