Class SamplePartitioner
- java.lang.Object
-
- com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner
-
- All Implemented Interfaces:
Partitioner
@Internal public final class SamplePartitioner extends java.lang.Object
Sample PartitionerSamples the collection to generate partitions.
Uses the average document size to split the collection into average sized chunks
The partitioner samples the collection, projects and sorts by the partition fields. Then uses every
samplesPerPartition
as the value to use to calculate the partition boundaries.- "partition.field": The field to be used for partitioning. Must be a unique field. Defaults to: "_id".
- "partition.size": The average size (MB) for each partition. Note: Uses the average document size to determine the number of documents per partition so may not be even. Defaults to: 64.
- "samples.per.partition": The number of samples to take per partition.
Defaults to: 10. The total number of samples taken is
calculated as:
samples per partition * (count / number of documents per partition)
.
- "partition.field": The field to be used for partitioning. Defaults to: "_id".
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
ID_FIELD
static java.lang.String
PARTITION_FIELD_CONFIG
static java.lang.String
PARTITION_FIELD_DEFAULT
static java.lang.String
PARTITION_SIZE_MB_CONFIG
-
Fields inherited from interface com.mongodb.spark.sql.connector.read.partitioner.Partitioner
LOGGER
-
-
Constructor Summary
Constructors Constructor Description SamplePartitioner()
Construct an instance
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<MongoInputPartition>
generatePartitions(ReadConfig readConfig)
Generate the partitions for the collection based upon the read configuration
-
-
-
Field Detail
-
PARTITION_SIZE_MB_CONFIG
public static final java.lang.String PARTITION_SIZE_MB_CONFIG
- See Also:
- Constant Field Values
-
ID_FIELD
public static final java.lang.String ID_FIELD
- See Also:
- Constant Field Values
-
PARTITION_FIELD_DEFAULT
public static final java.lang.String PARTITION_FIELD_DEFAULT
- See Also:
- Constant Field Values
-
PARTITION_FIELD_CONFIG
public static final java.lang.String PARTITION_FIELD_CONFIG
- See Also:
- Constant Field Values
-
-
Method Detail
-
generatePartitions
public java.util.List<MongoInputPartition> generatePartitions(ReadConfig readConfig)
Description copied from interface:Partitioner
Generate the partitions for the collection based upon the read configuration- Parameters:
readConfig
- the read configuration- Returns:
- the partitions
-
-