t

ra3

RelationalAlgebra

trait RelationalAlgebra extends AnyRef

Self Type
Table
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RelationalAlgebra
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def addColOfSameSegmentation(c: Column, colName: String)(implicit tsc: TaskSystemComponents): IO[Table]

    Concat list of columns

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  7. def concatenate(others: Table*)(implicit tsc: TaskSystemComponents): IO[Table]

    This is almost noop, concat the list of segments

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  10. def equijoin(other: Table, joinColumnSelf: Int, joinColumnOther: Int, how: String, partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(query: (TableReference, TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]

    • Partition both tables by join column
    • For each partition of both input tables
    • Buffer the partition completely (all segments, all columns)
    • Join buffered tables in memory, use saddle's Index?
    • concat joined partitions
  11. def equijoinMultiple(joinColumnSelf: Int, others: Seq[(Table, Int, String, Int)], partitionBase: Int, partitionLimit: Int)(query: (Seq[TableReference]) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
  12. def exportToCsv(columnSeparator: Char = ',', quoteChar: Char = '"', recordSeparator: String = "\r\n", compression: Option[CompressionFormat] = Some(ExportCsv.Gzip))(implicit tsc: TaskSystemComponents): IO[List[SharedFile]]
  13. def filterColumnNames(nameSuffix: String)(p: (String) => Boolean): Table
  14. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  15. def groupBy(cols: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[GroupedTable]

    Group by which return group locations

    Group by which return group locations

    Returns a triple for each input segment: group map, number of groups, group sizes

  16. def groupBySegments(cols: Seq[Int])(implicit tsc: TaskSystemComponents): IO[GroupedTable]

    Group by without partitioning

    Group by without partitioning

    Useful to reduce the segments without partitioning

  17. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  18. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  22. def partition(columnIdx: Seq[Int], partitionBase: Int, numPartitionsIsImportant: Boolean, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Vector[PartitionedTable]]
  23. def pivot(columnGroupRows: Int, columnGroupColumns: Int, valueColumn: Int): IO[Table]

    Pivot is two nested group by followed by aggregation and rearranging the results into a new table

    Pivot is two nested group by followed by aggregation and rearranging the results into a new table

    • Get all distinct elements of columnGroupColumns. Use group by for this. This is the new list of columns.
    • Partition by columnGroupRows
    • Buffer all three columns of a partition, and pivot it in mem. Use the list of columns, place nulls if needed.
    • Concatenate
  24. def prePartition(columnIdx: Seq[Int], partitionBase: Int, partitionLimit: Int, maxSegmentsToBufferAtOnce: Int)(implicit tsc: TaskSystemComponents): IO[Table]
  25. def query(query: (TableReference) => Query)(implicit tsc: TaskSystemComponents): IO[Table]
  26. def reduceTable(query: (TableReference) => Expr { type T <: ra3.lang.ReturnValue })(implicit tsc: TaskSystemComponents): IO[Table]
  27. def rfilter(predicate: Column)(implicit tsc: TaskSystemComponents): IO[Table]

    Variant which takes BufferedTable => BufferInt

    • Align predicate segment with table segmentation
    • For each aligned predicate segment, buffer it
    • For each column
    • For each segment in the column
    • Buffer column segment
    • Apply buffered predicate segment to buffered column segment
    • Write applied buffer to local segment
    • Resegment

    Variant which takes BufferedTable => BufferInt

  28. def rfilterInEquality(columnIdx: Int, cutoff: Segment, lessThan: Boolean)(implicit tsc: TaskSystemComponents): IO[Table]
  29. def selectColumns(columnIndexes: Int*)(implicit tsc: TaskSystemComponents): IO[Table]

    This is almost noop, select columns

  30. def sort(sortColumn: Int, ascending: Boolean): IO[Table]

    \== Sorting

    \== Sorting

    We sort by parallel distributed sort

    We sort only on 1 colum

    • We need an estimate of the CDF (see doc of other method)
    • From the approximate CDF we select n values which partition the data evenly into n+1 partitions
    • We write those partitions (all columns) - Sort the partitions (all columns)
    • Rearrange the sorted partitions in the correct order
  31. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  32. def take(indexes: Int32Column)(implicit tsc: TaskSystemComponents): IO[Table]

    • For each aligned index segment, buffer it
    • For each column
    • For each segment in the column
    • Buffer column segment
    • Apply buffered predicate segment to buffered column segment
    • Write applied buffer to segment and upload
    indexes

    for each segment

  33. def toString(): String
    Definition Classes
    AnyRef → Any
  34. def topK(sortColumn: Int, ascending: Boolean, k: Int, cdfCoverage: Double, cdfNumberOfSamplesPerSegment: Int)(implicit tsc: TaskSystemComponents): IO[Table]

    \= Top K selection

    \= Top K selection

    • We need an estimate of the CDF
    • From the approximate CDF we select the V value below which K elements fall
    • Scan all segments and find the index set which picks those elements below V . TakeIndex on all columns
    • Rearrange into table
  35. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  36. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  37. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped