Class ReadConfig
- java.lang.Object
-
- com.mongodb.spark.sql.connector.config.ReadConfig
-
- All Implemented Interfaces:
MongoConfig
,java.io.Serializable
public final class ReadConfig extends java.lang.Object
The Read ConfigurationThe
MongoConfig
for reads.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
AGGREGATION_ALLOW_DISK_USE_CONFIG
Allow disk use when running the aggregation.static java.lang.String
AGGREGATION_PIPELINE_CONFIG
Provide a custom aggregation pipeline.static java.lang.String
AGGREGATION_PIPELINE_DEFAULT
static java.lang.String
INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
Enable Map Types when inferring the schema.static java.lang.String
INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG
The minimum size of aStructType
before its inferred to aMapType
instead.static java.lang.String
INFER_SCHEMA_SAMPLE_SIZE_CONFIG
The size of the sample of documents from the collection to use when inferring the schemastatic java.lang.String
OUTPUT_EXTENDED_JSON_CONFIG
Output extended JSON for any String types.static java.lang.String
PARTITIONER_CONFIG
The partitioner full class name.static java.lang.String
PARTITIONER_DEFAULT
The default partitioner if none is set: "com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner"static java.lang.String
PARTITIONER_OPTIONS_PREFIX
The prefix for specific partitioner based configuration.static java.lang.String
STREAM_LOOKUP_FULL_DOCUMENT_CONFIG
Streaming full document configuration.static java.lang.String
STREAM_MICRO_BATCH_MAX_PARTITION_COUNT_CONFIG
Configures the maximum number of partitions per micro batch.static java.lang.String
STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG
Publish Full Document only when streaming.static java.lang.String
STREAMING_STARTUP_MODE_CONFIG
The start up behavior when there is no stored offset available.static java.lang.String
STREAMING_STARTUP_MODE_TIMESTAMP_START_AT_OPERATION_TIME_CONFIG
The `startAtOperationTime` configuration.-
Fields inherited from interface com.mongodb.spark.sql.connector.config.MongoConfig
CLIENT_FACTORY_CONFIG, CLIENT_FACTORY_DEFAULT, COLLECTION_NAME_CONFIG, COMMENT_CONFIG, CONNECTION_STRING_CONFIG, CONNECTION_STRING_DEFAULT, DATABASE_NAME_CONFIG, PREFIX, READ_PREFIX, WRITE_PREFIX
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
doWithClient(java.util.function.Consumer<com.mongodb.client.MongoClient> consumer)
Loans aMongoClient
to the user, does not return a result.void
doWithCollection(java.util.function.Consumer<com.mongodb.client.MongoCollection<org.bson.BsonDocument>> consumer)
Loans aMongoCollection
to the user, does not return a result.boolean
equals(java.lang.Object o)
boolean
getAggregationAllowDiskUse()
java.util.List<org.bson.BsonDocument>
getAggregationPipeline()
java.lang.String
getCollectionName()
java.lang.String
getDatabaseName()
int
getInferSchemaMapTypeMinimumKeySize()
int
getInferSchemaSampleSize()
int
getMicroBatchMaxPartitionCount()
com.mongodb.client.MongoClient
getMongoClient()
Returns a MongoClientjava.util.Map<java.lang.String,java.lang.String>
getOptions()
java.util.Map<java.lang.String,java.lang.String>
getOriginals()
Partitioner
getPartitioner()
MongoConfig
getPartitionerOptions()
com.mongodb.client.model.changestream.FullDocument
getStreamFullDocument()
org.bson.BsonTimestamp
getStreamInitialBsonTimestamp()
Returns the initial start at operation time for a streamint
hashCode()
boolean
inferSchemaMapType()
boolean
outputExtendedJson()
boolean
streamPublishFullDocumentOnly()
java.lang.String
toString()
<T> T
withClient(java.util.function.Function<com.mongodb.client.MongoClient,T> function)
Runs a function against aMongoClient
<T> T
withCollection(java.util.function.Function<com.mongodb.client.MongoCollection<org.bson.BsonDocument>,T> function)
Runs a function against aMongoCollection
ReadConfig
withOption(java.lang.String key, java.lang.String value)
Return aMongoConfig
instance with the extra options applied.ReadConfig
withOptions(java.util.Map<java.lang.String,java.lang.String> options)
Return aMongoConfig
instance with the extra options applied.-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.mongodb.spark.sql.connector.config.MongoConfig
containsKey, get, getBoolean, getComment, getConnectionString, getDouble, getInt, getList, getLong, getNamespace, getOrDefault, subConfiguration, toReadConfig, toWriteConfig
-
-
-
-
Field Detail
-
PARTITIONER_CONFIG
public static final java.lang.String PARTITIONER_CONFIG
The partitioner full class name.Partitioners must implement the
Partitioner
interface.Configuration: "partitioner"
Default: "com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner"
- See Also:
- Constant Field Values
-
PARTITIONER_DEFAULT
public static final java.lang.String PARTITIONER_DEFAULT
The default partitioner if none is set: "com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner"- See Also:
PARTITIONER_CONFIG
, Constant Field Values
-
PARTITIONER_OPTIONS_PREFIX
public static final java.lang.String PARTITIONER_OPTIONS_PREFIX
The prefix for specific partitioner based configuration.Any configuration beginning with this prefix is available via
getPartitionerOptions()
.Configuration: "partitioner.options."
- See Also:
- Constant Field Values
-
INFER_SCHEMA_SAMPLE_SIZE_CONFIG
public static final java.lang.String INFER_SCHEMA_SAMPLE_SIZE_CONFIG
The size of the sample of documents from the collection to use when inferring the schemaConfiguration: "sampleSize"
Default: 1000
- See Also:
- Constant Field Values
-
INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
public static final java.lang.String INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
Enable Map Types when inferring the schema.If enabled large compatible struct types will be inferred to a
MapType
instead.Configuration: "sql.inferSchema.mapTypes.enabled"
Default: true
- See Also:
- Constant Field Values
-
INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG
public static final java.lang.String INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG
The minimum size of aStructType
before its inferred to aMapType
instead.Configuration: "sql.inferSchema.mapTypes.minimum.key.size"
Default: 250. Requires
INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
- See Also:
- Constant Field Values
-
AGGREGATION_PIPELINE_CONFIG
public static final java.lang.String AGGREGATION_PIPELINE_CONFIG
Provide a custom aggregation pipeline.Enables a custom aggregation pipeline to be applied to the collection before sending data to Spark.
When configuring this should either be an extended json representation of a list of documents:
Or the extended json syntax of a single document:[{"$match": {"closed": false}}, {"$project": {"status": 1, "name": 1, "description": 1}}]
{"$match": {"closed": false}}
Note: Custom aggregation pipelines must work with the partitioner strategy. Some aggregation stages such as "$group" are not suitable for any partitioner that produces more than one partition.
Configuration: "aggregation.pipeline"
Default: no aggregation pipeline.
- See Also:
- Constant Field Values
-
AGGREGATION_PIPELINE_DEFAULT
public static final java.lang.String AGGREGATION_PIPELINE_DEFAULT
- See Also:
- Constant Field Values
-
AGGREGATION_ALLOW_DISK_USE_CONFIG
public static final java.lang.String AGGREGATION_ALLOW_DISK_USE_CONFIG
Allow disk use when running the aggregation.Configuration: "aggregation.allowDiskUse"
Default: true and allows users to disable writing to disk.
- See Also:
- Constant Field Values
-
STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG
public static final java.lang.String STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG
Publish Full Document only when streaming.Note: Only publishes the actual changed document rather than the full change stream document. Overrides any configured `"change.stream.lookup.full.document"` values. Also filters the change stream events to include only events with a "fullDocument" field.
Configuration: "change.stream.publish.full.document.only"
Default: false.
- See Also:
- Constant Field Values
-
STREAM_LOOKUP_FULL_DOCUMENT_CONFIG
public static final java.lang.String STREAM_LOOKUP_FULL_DOCUMENT_CONFIG
Streaming full document configuration.Note: Determines what to return for update operations when using a Change Stream. See: Change streams lookup full document for update operations. for further information.
Set to "updateLookup" to look up the most current majority-committed version of the updated document.
Configuration: "change.stream.lookup.full.document"
Default: "default" - the servers default value in the fullDocument field.
- See Also:
- Constant Field Values
-
STREAMING_STARTUP_MODE_CONFIG
public static final java.lang.String STREAMING_STARTUP_MODE_CONFIG
The start up behavior when there is no stored offset available.Specifies how the connector should start up when there is no offset available.
Resuming a change stream requires a resume token, which the connector stores as / reads from the offset. If no offset is available, the connector may either ignore all existing data, or may read an offset from the configuration.
Possible values are:
- 'latest' is the default value. The connector creates a new change stream, processes change events from it and stores resume tokens from them, thus ignoring all existing source data.
- 'timestamp' actuates 'change.stream.startup.mode.timestamp.*' properties." If no such properties are configured, then 'timestamp' is equivalent to 'latest'.
- See Also:
- Constant Field Values
-
STREAMING_STARTUP_MODE_TIMESTAMP_START_AT_OPERATION_TIME_CONFIG
public static final java.lang.String STREAMING_STARTUP_MODE_TIMESTAMP_START_AT_OPERATION_TIME_CONFIG
The `startAtOperationTime` configuration.Actuated only if 'change.stream.startup.mode = timestamp'. Specifies the starting point for the change stream.
Must be either an integer number of seconds since the Epoch in the decimal format (example: 30), or an instant in the ISO-8601 format with one second precision (example: '1970-01-01T00:00:30Z'), or a BSON Timestamp in the canonical extended JSON (v2) format (example: '{\"$timestamp\": {\"t\": 30, \"i\": 0}}').
You may specify '0' to start at the beginning of the oplog.
Note: Requires MongoDB 4.0 or above.
See changeStreams.
- See Also:
- Constant Field Values
-
STREAM_MICRO_BATCH_MAX_PARTITION_COUNT_CONFIG
public static final java.lang.String STREAM_MICRO_BATCH_MAX_PARTITION_COUNT_CONFIG
Configures the maximum number of partitions per micro batch.Divides a micro batch into a maximum number of partitions, based on the seconds since epoch part of a BsonTimestamp. The smallest micro batch partition generated is one second.
Actuated only if using micro batch streams.
Default: 1
Warning: Splitting up into multiple partitions, removes any guarantees of processing the change events in as happens order. Therefore, care should be taken to ensure partitioning and processing won't cause data inconsistencies downstream.
See bson timestamp.
- See Also:
- Constant Field Values
-
OUTPUT_EXTENDED_JSON_CONFIG
public static final java.lang.String OUTPUT_EXTENDED_JSON_CONFIG
Output extended JSON for any String types.Configuration: "outputExtendedJson"
Default: false
If true, will produce extended JSON for any fields that have the String datatype.
- Since:
- 10.1
- See Also:
- Constant Field Values
-
-
Method Detail
-
withOption
public ReadConfig withOption(java.lang.String key, java.lang.String value)
Description copied from interface:MongoConfig
Return aMongoConfig
instance with the extra options applied.Existing configurations may be overwritten by the new options.
- Parameters:
key
- the key to addvalue
- the value to add- Returns:
- a new MongoConfig
-
withOptions
public ReadConfig withOptions(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:MongoConfig
Return aMongoConfig
instance with the extra options applied.Existing configurations may be overwritten by the new options.
- Parameters:
options
- the context specific options.- Returns:
- a new MongoConfig
-
getInferSchemaSampleSize
public int getInferSchemaSampleSize()
- Returns:
- the configured infer sample size
-
inferSchemaMapType
public boolean inferSchemaMapType()
- Returns:
- the configured infer sample size
-
getInferSchemaMapTypeMinimumKeySize
public int getInferSchemaMapTypeMinimumKeySize()
- Returns:
- the configured infer sample size
-
getPartitioner
public Partitioner getPartitioner()
- Returns:
- the partitioner instance
-
getPartitionerOptions
public MongoConfig getPartitionerOptions()
- Returns:
- any partitioner configuration
-
getAggregationPipeline
public java.util.List<org.bson.BsonDocument> getAggregationPipeline()
- Returns:
- the aggregation pipeline to filter the collection with
-
getAggregationAllowDiskUse
public boolean getAggregationAllowDiskUse()
- Returns:
- the aggregation allow disk use value
-
streamPublishFullDocumentOnly
public boolean streamPublishFullDocumentOnly()
- Returns:
- true if the stream should publish the full document only.
-
getStreamFullDocument
public com.mongodb.client.model.changestream.FullDocument getStreamFullDocument()
- Returns:
- the stream full document configuration or null if not set.
-
getStreamInitialBsonTimestamp
public org.bson.BsonTimestamp getStreamInitialBsonTimestamp()
Returns the initial start at operation time for a streamNote: This value will be ignored if the timestamp is negative or there is an existing offset present for the stream.
- Returns:
- the start at operation time for a stream
- Since:
- 10.2
-
getMicroBatchMaxPartitionCount
public int getMicroBatchMaxPartitionCount()
- Returns:
- the micro batch max partition count
-
outputExtendedJson
public boolean outputExtendedJson()
- Returns:
- true if should ouput extended JSON
- Since:
- 10.1
-
getOriginals
public java.util.Map<java.lang.String,java.lang.String> getOriginals()
- Specified by:
getOriginals
in interfaceMongoConfig
- Returns:
- the original options for this MongoConfig instance
-
getOptions
public java.util.Map<java.lang.String,java.lang.String> getOptions()
- Specified by:
getOptions
in interfaceMongoConfig
- Returns:
- the options for this MongoConfig instance
-
getDatabaseName
public java.lang.String getDatabaseName()
- Specified by:
getDatabaseName
in interfaceMongoConfig
- Returns:
- the database name to use for this configuration
-
getCollectionName
public java.lang.String getCollectionName()
- Specified by:
getCollectionName
in interfaceMongoConfig
- Returns:
- the collection name to use for this configuration
-
getMongoClient
public com.mongodb.client.MongoClient getMongoClient()
Returns a MongoClientOnce the
MongoClient
is no longer required, it MUST be closed by callingmongoClient.close()
.- Returns:
- the MongoClient from the cache or create a new one using the
MongoClientFactory
.
-
withClient
public <T> T withClient(java.util.function.Function<com.mongodb.client.MongoClient,T> function)
Runs a function against aMongoClient
- Type Parameters:
T
- The return type- Parameters:
function
- the function that is passed theMongoClient
- Returns:
- the result of the function
-
doWithClient
public void doWithClient(java.util.function.Consumer<com.mongodb.client.MongoClient> consumer)
Loans aMongoClient
to the user, does not return a result.- Parameters:
consumer
- the consumer of theMongoClient
-
withCollection
public <T> T withCollection(java.util.function.Function<com.mongodb.client.MongoCollection<org.bson.BsonDocument>,T> function)
Runs a function against aMongoCollection
- Type Parameters:
T
- The return type- Parameters:
function
- the function that is passed theMongoCollection
- Returns:
- the result of the function
-
doWithCollection
public void doWithCollection(java.util.function.Consumer<com.mongodb.client.MongoCollection<org.bson.BsonDocument>> consumer)
Loans aMongoCollection
to the user, does not return a result.- Parameters:
consumer
- the consumer of theMongoCollection<BsonDocument>
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
equals
@TestOnly public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
-