java.lang.Object
- org.apache.flink.runtime.scheduler.adaptivebatch.util.VertexParallelismAndInputInfosDeciderUtils

public class VertexParallelismAndInputInfosDeciderUtils
extends Object

Utils class for VertexParallelismAndInputInfosDecider.

Constructor Summary

Constructors
Constructor Description

VertexParallelismAndInputInfosDeciderUtils()

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static Optional<List<IndexRange>>`	`adjustToClosestLegalParallelism(long currentDataVolumeLimit, int currentParallelism, int minParallelism, int maxParallelism, long minLimit, long maxLimit, Function<Long,Integer> parallelismComputer, Function<Long,List<IndexRange>> subpartitionRangesComputer)`	Adjust the parallelism to the closest legal parallelism and return the computed subpartition ranges.
`static long`	`calculateDataVolumePerTaskForInput(long globalDataVolumePerTask, long inputsGroupBytes, long totalDataBytes)`
`static long`	`calculateDataVolumePerTaskForInputsGroup(long globalDataVolumePerTask, List<BlockingInputInfo> inputsGroup, List<BlockingInputInfo> allInputs)`
`static <T> List<List<T>>`	`cartesianProduct(List<List<T>> lists)`	Computes the Cartesian product of a list of lists.
`static boolean`	`checkAndGetIntraCorrelation(List<BlockingInputInfo> inputInfos)`
`static int`	`checkAndGetParallelism(Collection<JobVertexInputInfo> vertexInputInfos)`
`static int`	`checkAndGetSubpartitionNum(List<BlockingInputInfo> consumedResults)`
`static int`	`checkAndGetSubpartitionNumForAggregatedInputs(Collection<AggregatedBlockingInputInfo> inputInfos)`
`static long`	`computeSkewThreshold(long medianSize, double skewedFactor, long defaultSkewedThreshold)`	Computes the skew threshold based on the given media size and skewed factor.
`static long`	`computeTargetSize(long[] subpartitionBytes, long skewedThreshold, long dataVolumePerTask)`	Computes the target data size for each task based on the sizes of non-skewed subpartitions.
`static JobVertexInputInfo`	`createdJobVertexInputInfoForBroadcast(BlockingInputInfo inputInfo, int parallelism)`
`static JobVertexInputInfo`	`createdJobVertexInputInfoForNonBroadcast(BlockingInputInfo inputInfo, List<IndexRange> subpartitionSliceRanges, List<SubpartitionSlice> subpartitionSlices)`
`static Map<IntermediateDataSetID,JobVertexInputInfo>`	`createJobVertexInputInfos(List<BlockingInputInfo> inputInfos, Map<Integer,List<SubpartitionSlice>> subpartitionSlices, List<IndexRange> subpartitionSliceRanges, Function<Integer,Integer> subpartitionSliceKeyResolver)`
`static int`	`getMaxNumPartitions(List<BlockingInputInfo> consumedResults)`
`static List<BlockingInputInfo>`	`getNonBroadcastInputInfos(List<BlockingInputInfo> consumedResults)`
`static boolean`	`hasSameNumPartitions(List<BlockingInputInfo> inputInfos)`
`static boolean`	`isLegalParallelism(int parallelism, int minParallelism, int maxParallelism)`
`static void`	`logBalancedDataDistributionOptimizationResult(org.slf4j.Logger logger, JobVertexID jobVertexId, BlockingInputInfo inputInfo, JobVertexInputInfo optimizedJobVertexInputInfo)`	Logs the data distribution optimization info when a balanced data distribution algorithm is effectively optimized compared to the num-based data distribution algorithm.
`static long`	`median(long[] nums)`	Calculates the median of a given array of long integers.
`static Optional<List<IndexRange>>`	`tryComputeSubpartitionSliceRange(int minParallelism, int maxParallelism, long maxDataVolumePerTask, Map<Integer,List<SubpartitionSlice>> subpartitionSlices)`	Attempts to compute the subpartition slice ranges to ensure even distribution of data across downstream tasks.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- VertexParallelismAndInputInfosDeciderUtils
```
public VertexParallelismAndInputInfosDeciderUtils()
```

Method Detail

adjustToClosestLegalParallelism

public static Optional<List<IndexRange>> adjustToClosestLegalParallelism(long currentDataVolumeLimit,
                                                                         int currentParallelism,
                                                                         int minParallelism,
                                                                         int maxParallelism,
                                                                         long minLimit,
                                                                         long maxLimit,
                                                                         Function<Long,Integer> parallelismComputer,
                                                                         Function<Long,List<IndexRange>> subpartitionRangesComputer)

Adjust the parallelism to the closest legal parallelism and return the computed subpartition ranges.

Parameters:: currentDataVolumeLimit - current data volume limit; currentParallelism - current parallelism; minParallelism - the min parallelism; maxParallelism - the max parallelism; minLimit - the minimum data volume limit; maxLimit - the maximum data volume limit; parallelismComputer - a function to compute the parallelism according to the data volume limit; subpartitionRangesComputer - a function to compute the subpartition ranges according to the data volume limit
Returns:: the computed subpartition ranges or Optional.empty() if we can't find any legal parallelism

cartesianProduct
```
public static <T> List<List<T>> cartesianProduct(List<List<T>> lists)
```
Computes the Cartesian product of a list of lists.
The Cartesian product is a set of all possible combinations formed by picking one element from each list. For example, given input lists [[1, 2], [3, 4]], the result will be [[1, 3], [1, 4], [2, 3], [2, 4]].
Note: If the input list is empty or contains an empty list, the result will be an empty list.

Type Parameters:

T - the type of elements in the lists

Parameters:

lists - a list of lists for which the Cartesian product is to be computed

Returns:

a list of lists representing the Cartesian product, where each inner list is a combination

median
```
public static long median(long[] nums)
```
Calculates the median of a given array of long integers. If the calculated median is less than 1, it returns 1 instead.

Parameters:

nums - an array of long integers for which to calculate the median.

Returns:

the median value, which will be at least 1.

computeSkewThreshold
```
public static long computeSkewThreshold(long medianSize,
                                        double skewedFactor,
                                        long defaultSkewedThreshold)
```
Computes the skew threshold based on the given media size and skewed factor.
The skew threshold is calculated as the product of the media size and the skewed factor. To ensure that the computed threshold does not fall below a specified default value, the method uses Math.max(int, int) to return the largest of the calculated threshold and the default threshold.

Parameters:

medianSize - the size of the median

skewedFactor - a factor indicating the degree of skewness

defaultSkewedThreshold - the default threshold to be used if the calculated threshold is less than this value

Returns:

the computed skew threshold, which is guaranteed to be at least the default skewed threshold.

computeTargetSize
```
public static long computeTargetSize(long[] subpartitionBytes,
                                     long skewedThreshold,
                                     long dataVolumePerTask)
```
Computes the target data size for each task based on the sizes of non-skewed subpartitions.
The target size is determined as the average size of non-skewed subpartitions and ensures that the target size is at least equal to the specified data volume per task.

Parameters:

subpartitionBytes - an array representing the data size of each subpartition

skewedThreshold - skewed threshold in bytes

dataVolumePerTask - the amount of data that should be allocated per task

Returns:

the computed target size for each task, which is the maximum between the average size of non-skewed subpartitions and data volume per task.

getNonBroadcastInputInfos

public static List<BlockingInputInfo> getNonBroadcastInputInfos(List<BlockingInputInfo> consumedResults)

hasSameNumPartitions

public static boolean hasSameNumPartitions(List<BlockingInputInfo> inputInfos)

getMaxNumPartitions

public static int getMaxNumPartitions(List<BlockingInputInfo> consumedResults)

checkAndGetSubpartitionNum

public static int checkAndGetSubpartitionNum(List<BlockingInputInfo> consumedResults)

checkAndGetSubpartitionNumForAggregatedInputs

public static int checkAndGetSubpartitionNumForAggregatedInputs(Collection<AggregatedBlockingInputInfo> inputInfos)

isLegalParallelism

public static boolean isLegalParallelism(int parallelism,
                                         int minParallelism,
                                         int maxParallelism)

checkAndGetIntraCorrelation

public static boolean checkAndGetIntraCorrelation(List<BlockingInputInfo> inputInfos)

checkAndGetParallelism

public static int checkAndGetParallelism(Collection<JobVertexInputInfo> vertexInputInfos)

tryComputeSubpartitionSliceRange
```
public static Optional<List<IndexRange>> tryComputeSubpartitionSliceRange(int minParallelism,
                                                                          int maxParallelism,
                                                                          long maxDataVolumePerTask,
                                                                          Map<Integer,List<SubpartitionSlice>> subpartitionSlices)
```
Attempts to compute the subpartition slice ranges to ensure even distribution of data across downstream tasks.
This method first tries to compute the subpartition slice ranges by evenly distributing the data volume. If that fails, it attempts to compute the ranges by evenly distributing the number of subpartition slices.

Parameters:

minParallelism - The minimum parallelism.

maxParallelism - The maximum parallelism.

maxDataVolumePerTask - The maximum data volume per task.

subpartitionSlices - A map of lists of subpartition slices grouped by type or index number.

Returns:

An Optional containing a list of index ranges representing the subpartition slice ranges. Returns an empty Optional if no suitable ranges can be computed.

createJobVertexInputInfos

public static Map<IntermediateDataSetID,JobVertexInputInfo> createJobVertexInputInfos(List<BlockingInputInfo> inputInfos,
                                                                                            Map<Integer,List<SubpartitionSlice>> subpartitionSlices,
                                                                                            List<IndexRange> subpartitionSliceRanges,
                                                                                            Function<Integer,Integer> subpartitionSliceKeyResolver)

createdJobVertexInputInfoForBroadcast

public static JobVertexInputInfo createdJobVertexInputInfoForBroadcast(BlockingInputInfo inputInfo,
                                                                       int parallelism)

createdJobVertexInputInfoForNonBroadcast

public static JobVertexInputInfo createdJobVertexInputInfoForNonBroadcast(BlockingInputInfo inputInfo,
                                                                          List<IndexRange> subpartitionSliceRanges,
                                                                          List<SubpartitionSlice> subpartitionSlices)

calculateDataVolumePerTaskForInputsGroup

public static long calculateDataVolumePerTaskForInputsGroup(long globalDataVolumePerTask,
                                                            List<BlockingInputInfo> inputsGroup,
                                                            List<BlockingInputInfo> allInputs)

calculateDataVolumePerTaskForInput

public static long calculateDataVolumePerTaskForInput(long globalDataVolumePerTask,
                                                      long inputsGroupBytes,
                                                      long totalDataBytes)

logBalancedDataDistributionOptimizationResult

public static void logBalancedDataDistributionOptimizationResult(org.slf4j.Logger logger,
                                                                 JobVertexID jobVertexId,
                                                                 BlockingInputInfo inputInfo,
                                                                 JobVertexInputInfo optimizedJobVertexInputInfo)

Logs the data distribution optimization info when a balanced data distribution algorithm is effectively optimized compared to the num-based data distribution algorithm.

Parameters:: logger - The logger instance used for logging output.; jobVertexId - The id for the job vertex.; inputInfo - The original input info; optimizedJobVertexInputInfo - The optimized job vertex input info.

Class VertexParallelismAndInputInfosDeciderUtils

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

VertexParallelismAndInputInfosDeciderUtils