Class BinaryHashTable

  • All Implemented Interfaces:
    org.apache.flink.core.memory.MemorySegmentSource, MemorySegmentPool

    public class BinaryHashTable
    extends BaseHybridHashTable
    An implementation of a Hybrid Hash Join. The join starts operating in memory and gradually starts spilling contents to disk, when the memory is not sufficient. It does not need to know a priority how large the input will be.

    The design of this class follows in many parts the design presented in "Hash joins and hash teams in Microsoft SQL Server", by Goetz Graefe et al. In its current state, the implementation lacks features like dynamic role reversal, partition tuning, or histogram guided partitioning.

    • Constructor Detail

      • BinaryHashTable

        public BinaryHashTable​(Object owner,
                               boolean compressionEnabled,
                               int compressionBlockSize,
                               AbstractRowDataSerializer buildSideSerializer,
                               AbstractRowDataSerializer probeSideSerializer,
                               Projection<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.binary.BinaryRowData> buildSideProjection,
                               Projection<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.binary.BinaryRowData> probeSideProjection,
                               org.apache.flink.runtime.memory.MemoryManager memManager,
                               long reservedMemorySize,
                               org.apache.flink.runtime.io.disk.iomanager.IOManager ioManager,
                               int avgRecordLen,
                               long buildRowCount,
                               boolean useBloomFilters,
                               HashJoinType type,
                               JoinCondition condFunc,
                               boolean reverseJoin,
                               boolean[] filterNulls,
                               boolean tryDistinctBuildRow)
    • Method Detail

      • putBuildRow

        public void putBuildRow​(org.apache.flink.table.data.RowData row)
                         throws IOException
        Put a build side row to hash table.
        Throws:
        IOException
      • tryProbe

        public boolean tryProbe​(org.apache.flink.table.data.RowData record)
                         throws IOException
        Find matched build side rows for a probe row.
        Returns:
        return false if the target partition has spilled, we will spill this probe row too. The row will be re-match in rebuild phase.
        Throws:
        IOException
      • nextMatching

        public boolean nextMatching()
                             throws IOException
        Next record from rebuilt spilled partition or build side outer partition.
        Throws:
        IOException
      • getCurrentProbeRow

        public org.apache.flink.table.data.RowData getCurrentProbeRow()
      • getBuildSideIterator

        public RowIterator<org.apache.flink.table.data.binary.BinaryRowData> getBuildSideIterator()
      • clearPartitions

        public void clearPartitions()
        This method clears all partitions currently residing (partially) in memory. It releases all memory and deletes all spilled partitions.

        This method is intended for a hard cleanup in the case that the join is aborted.

        Specified by:
        clearPartitions in class BaseHybridHashTable
      • spillPartition

        protected int spillPartition()
                              throws IOException
        Selects a partition and spills it. The number of the spilled partition is returned.
        Specified by:
        spillPartition in class BaseHybridHashTable
        Returns:
        The number of the spilled partition.
        Throws:
        IOException