Class PointwiseVertexInputInfoComputer
- java.lang.Object
-
- org.apache.flink.runtime.scheduler.adaptivebatch.util.PointwiseVertexInputInfoComputer
-
public class PointwiseVertexInputInfoComputer extends Object
Helper class that computes VertexInputInfo for pointwise input.
-
-
Constructor Summary
Constructors Constructor Description PointwiseVertexInputInfoComputer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<IntermediateDataSetID,JobVertexInputInfo>
compute(List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for POINTWISE, such that different downstream subtasks consume roughly the same amount of data.
-
-
-
Method Detail
-
compute
public Map<IntermediateDataSetID,JobVertexInputInfo> compute(List<BlockingInputInfo> inputInfos, int parallelism, int minParallelism, int maxParallelism, long dataVolumePerTask)
Decide parallelism and input infos, which will make the data be evenly distributed to downstream subtasks for POINTWISE, such that different downstream subtasks consume roughly the same amount of data.Assume that `inputInfo` has two partitions, each partition has three subpartitions, their data bytes are: {0->[1,2,1], 1->[2,1,2]}, and the expected parallelism is 3. The calculation process is as follows:
1. Create subpartition slices for input which is composed of several subpartitions. The created slice list and its data bytes are: [1,2,1,2,1,2]
2. Distribute the subpartition slices array into n balanced parts (described by `IndexRange`, named SubpartitionSliceRanges) based on data volume: [0,1],[2,3],[4,5]
3. Reorganize the distributed results into a mapping of partition range to subpartition range: {0 -> [0,1]}, {0->[2,2],1->[0,0]}, {1->[1,2]}.
The final result is the `SubpartitionGroup` that each of the three parallel tasks need to subscribe.- Parameters:
inputInfos
- The information of consumed blocking resultsparallelism
- The parallelism of the job vertexminParallelism
- the min parallelismmaxParallelism
- the max parallelismdataVolumePerTask
- proposed data volume per task for this set of inputInfo- Returns:
- the parallelism and vertex input infos
-
-