com.mongodb.spark.rdd.partitioner
The partition key property
The partition size MB property
Calculate the Partitions
Calculate the Partitions
the MongoConnector
the pipeline to apply if any. Note this pipeline may have been appended to during optimization.
the partitions
The pagination by size partitioner.
Paginates the collection into partitions based on their size. Uses the
collStats
command and the average document size to estimate the partition boundaries.Configuration Properties
The prefix when using
sparkConf
is:spark.mongodb.input.partitionerOptions
followed by the property name:_id
.64
.*Note:* This can be a expensive operation as it creates 1 cursor for every estimated
partitionSizeMB
s worth of documents. *Note:* Does not support views. UseMongoPaginateByCountPartitioner
or create a custom partitioner.1.0