Class AllToAllVertexInputInfoComputer
- java.lang.Object
-
- org.apache.flink.runtime.scheduler.adaptivebatch.util.AllToAllVertexInputInfoComputer
-
public class AllToAllVertexInputInfoComputer extends Object
Helper class that computes VertexInputInfo for all to all like inputs.
-
-
Constructor Summary
Constructors Constructor Description AllToAllVertexInputInfoComputer(double skewedFactor, long defaultSkewedThreshold)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<IntermediateDataSetID,JobVertexInputInfo>
compute(JobVertexID jobVertexId, List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for ALL_TO_ALL, such that different downstream subtasks consume roughly the same amount of data.
-
-
-
Method Detail
-
compute
public Map<IntermediateDataSetID,JobVertexInputInfo> compute(JobVertexID jobVertexId, List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for ALL_TO_ALL, such that different downstream subtasks consume roughly the same amount of data.Assume there are two input infos upstream, each with three partitions and two subpartitions, their data bytes information are: input1: 0->[1,1] 1->[2,2] 2->[3,3], input2: 0->[1,1] 1->[1,1] 2->[1,1]. This method processes the data as follows:
1. Create subpartition slices for inputs with same type number, different from pointwise computer, this method creates subpartition slices by following these steps: Firstly, reorganize the data by subpartition index: input1: {0->[1,2,3],1->[1,2,3]}, input2: {0->[1,1,1],1->[1,1,1]}. Secondly, split subpartitions with the same index into relatively balanced n parts (if possible): {0->[1,2][3],1->[1,2][3]}, {0->[1,1,1],1->[1,1,1]}. Then perform a cartesian product operation to ensure data correctness input1: {0->[1,2],0->[3],1->[1,2],1->[3]}, input2: {0->[1,1,1],0->[1,1,1],1->[1,1,1],1->[1,1,1]}, Finally, create subpartition slices base on the result of the previous step. i.e., each input has four balanced subpartition slices.
2. Based on the above subpartition slices, calculate the subpartition slice range each task needs to subscribe to, considering data volume and parallelism constraints: [0,0],[1,1],[2,2],[3,3]
3. Convert the calculated subpartition slice range to the form of partition index range -> subpartition index range:
task0: input1: {[0,1]->[0]} input2:{[0,2]->[0]}
task1: input1: {[2,2]->[0]} input2:{[0,2]->[0]}
task2: input1: {[0,1]->[1]} input2:{[0,2]->[1]}
task3: input1: {[2,2]->[1]} input2:{[0,2]->[1]}- Parameters:
jobVertexId
- The job vertex idinputInfos
- The information of consumed blocking resultsparallelism
- The parallelism of the job vertexminParallelism
- the min parallelismmaxParallelism
- the max parallelismdataVolumePerTask
- proposed data volume per task for this set of inputInfo- Returns:
- the parallelism and vertex input infos
-
-