Class ReadConfig

  • All Implemented Interfaces:
    MongoConfig, java.io.Serializable

    public final class ReadConfig
    extends java.lang.Object
    The Read Configuration

    The MongoConfig for reads.

    See Also:
    Serialized Form
    • Field Detail

      • PARTITIONER_DEFAULT

        public static final java.lang.String PARTITIONER_DEFAULT
        The default partitioner if none is set: "com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner"
        See Also:
        PARTITIONER_CONFIG, Constant Field Values
      • PARTITIONER_OPTIONS_PREFIX

        public static final java.lang.String PARTITIONER_OPTIONS_PREFIX
        The prefix for specific partitioner based configuration.

        Any configuration beginning with this prefix is available via getPartitionerOptions().

        Configuration: "partitioner.options."

        See Also:
        Constant Field Values
      • INFER_SCHEMA_SAMPLE_SIZE_CONFIG

        public static final java.lang.String INFER_SCHEMA_SAMPLE_SIZE_CONFIG
        The size of the sample of documents from the collection to use when inferring the schema

        Configuration: "sampleSize"

        Default: 1000

        See Also:
        Constant Field Values
      • INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG

        public static final java.lang.String INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG
        Enable Map Types when inferring the schema.

        If enabled large compatible struct types will be inferred to a MapType instead.

        Configuration: "sql.inferSchema.mapTypes.enabled"

        Default: true

        See Also:
        Constant Field Values
      • INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG

        public static final java.lang.String INFER_SCHEMA_MAP_TYPE_MINIMUM_KEY_SIZE_CONFIG
        The minimum size of a StructType before its inferred to a MapType instead.

        Configuration: "sql.inferSchema.mapTypes.minimum.key.size"

        Default: 250. Requires INFER_SCHEMA_MAP_TYPE_ENABLED_CONFIG

        See Also:
        Constant Field Values
      • AGGREGATION_PIPELINE_CONFIG

        public static final java.lang.String AGGREGATION_PIPELINE_CONFIG
        Provide a custom aggregation pipeline.

        Enables a custom aggregation pipeline to be applied to the collection before sending data to Spark.

        When configuring this should either be an extended json representation of a list of documents:

        
         [{"$match": {"closed": false}}, {"$project": {"status": 1, "name": 1, "description": 1}}]
         
        Or the extended json syntax of a single document:
        
         {"$match": {"closed": false}}
         

        Note: Custom aggregation pipelines must work with the partitioner strategy. Some aggregation stages such as "$group" are not suitable for any partitioner that produces more than one partition.

        Configuration: "aggregation.pipeline"

        Default: no aggregation pipeline.

        See Also:
        Constant Field Values
      • AGGREGATION_PIPELINE_DEFAULT

        public static final java.lang.String AGGREGATION_PIPELINE_DEFAULT
        See Also:
        Constant Field Values
      • AGGREGATION_ALLOW_DISK_USE_CONFIG

        public static final java.lang.String AGGREGATION_ALLOW_DISK_USE_CONFIG
        Allow disk use when running the aggregation.

        Configuration: "aggregation.allowDiskUse"

        Default: true and allows users to disable writing to disk.

        See Also:
        Constant Field Values
      • STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG

        public static final java.lang.String STREAM_PUBLISH_FULL_DOCUMENT_ONLY_CONFIG
        Publish Full Document only when streaming.

        Note: Only publishes the actual changed document rather than the full change stream document. Overrides any configured `"change.stream.lookup.full.document"` values. Also filters the change stream events to include only events with a "fullDocument" field.

        Configuration: "change.stream.publish.full.document.only"

        Default: false.

        See Also:
        Constant Field Values
      • STREAM_LOOKUP_FULL_DOCUMENT_CONFIG

        public static final java.lang.String STREAM_LOOKUP_FULL_DOCUMENT_CONFIG
        Streaming full document configuration.

        Note: Determines what to return for update operations when using a Change Stream. See: Change streams lookup full document for update operations. for further information.

        Set to "updateLookup" to look up the most current majority-committed version of the updated document.

        Configuration: "change.stream.lookup.full.document"

        Default: "default" - the servers default value in the fullDocument field.

        See Also:
        Constant Field Values
      • STREAMING_STARTUP_MODE_CONFIG

        public static final java.lang.String STREAMING_STARTUP_MODE_CONFIG
        The start up behavior when there is no stored offset available.

        Specifies how the connector should start up when there is no offset available.

        Resuming a change stream requires a resume token, which the connector stores as / reads from the offset. If no offset is available, the connector may either ignore all existing data, or may read an offset from the configuration.

        Possible values are:

        • 'latest' is the default value. The connector creates a new change stream, processes change events from it and stores resume tokens from them, thus ignoring all existing source data.
        • 'timestamp' actuates 'change.stream.startup.mode.timestamp.*' properties." If no such properties are configured, then 'timestamp' is equivalent to 'latest'.
        See Also:
        Constant Field Values
      • STREAMING_STARTUP_MODE_TIMESTAMP_START_AT_OPERATION_TIME_CONFIG

        public static final java.lang.String STREAMING_STARTUP_MODE_TIMESTAMP_START_AT_OPERATION_TIME_CONFIG
        The `startAtOperationTime` configuration.

        Actuated only if 'change.stream.startup.mode = timestamp'. Specifies the starting point for the change stream.

        Must be either an integer number of seconds since the Epoch in the decimal format (example: 30), or an instant in the ISO-8601 format with one second precision (example: '1970-01-01T00:00:30Z'), or a BSON Timestamp in the canonical extended JSON (v2) format (example: '{\"$timestamp\": {\"t\": 30, \"i\": 0}}').

        You may specify '0' to start at the beginning of the oplog.

        Note: Requires MongoDB 4.0 or above.

        See changeStreams.

        See Also:
        Constant Field Values
      • STREAM_MICRO_BATCH_MAX_PARTITION_COUNT_CONFIG

        public static final java.lang.String STREAM_MICRO_BATCH_MAX_PARTITION_COUNT_CONFIG
        Configures the maximum number of partitions per micro batch.

        Divides a micro batch into a maximum number of partitions, based on the seconds since epoch part of a BsonTimestamp. The smallest micro batch partition generated is one second.

        Actuated only if using micro batch streams.

        Default: 1

        Warning: Splitting up into multiple partitions, removes any guarantees of processing the change events in as happens order. Therefore, care should be taken to ensure partitioning and processing won't cause data inconsistencies downstream.

        See bson timestamp.

        See Also:
        Constant Field Values
      • OUTPUT_EXTENDED_JSON_CONFIG

        public static final java.lang.String OUTPUT_EXTENDED_JSON_CONFIG
        Output extended JSON for any String types.

        Configuration: "outputExtendedJson"

        Default: false

        If true, will produce extended JSON for any fields that have the String datatype.

        Since:
        10.1
        See Also:
        Constant Field Values
    • Method Detail

      • withOption

        public ReadConfig withOption​(java.lang.String key,
                                     java.lang.String value)
        Description copied from interface: MongoConfig
        Return a MongoConfig instance with the extra options applied.

        Existing configurations may be overwritten by the new options.

        Parameters:
        key - the key to add
        value - the value to add
        Returns:
        a new MongoConfig
      • withOptions

        public ReadConfig withOptions​(java.util.Map<java.lang.String,​java.lang.String> options)
        Description copied from interface: MongoConfig
        Return a MongoConfig instance with the extra options applied.

        Existing configurations may be overwritten by the new options.

        Parameters:
        options - the context specific options.
        Returns:
        a new MongoConfig
      • getInferSchemaSampleSize

        public int getInferSchemaSampleSize()
        Returns:
        the configured infer sample size
      • inferSchemaMapType

        public boolean inferSchemaMapType()
        Returns:
        the configured infer sample size
      • getInferSchemaMapTypeMinimumKeySize

        public int getInferSchemaMapTypeMinimumKeySize()
        Returns:
        the configured infer sample size
      • getPartitioner

        public Partitioner getPartitioner()
        Returns:
        the partitioner instance
      • getPartitionerOptions

        public MongoConfig getPartitionerOptions()
        Returns:
        any partitioner configuration
      • getAggregationPipeline

        public java.util.List<org.bson.BsonDocument> getAggregationPipeline()
        Returns:
        the aggregation pipeline to filter the collection with
      • getAggregationAllowDiskUse

        public boolean getAggregationAllowDiskUse()
        Returns:
        the aggregation allow disk use value
      • streamPublishFullDocumentOnly

        public boolean streamPublishFullDocumentOnly()
        Returns:
        true if the stream should publish the full document only.
      • getStreamFullDocument

        public com.mongodb.client.model.changestream.FullDocument getStreamFullDocument()
        Returns:
        the stream full document configuration or null if not set.
      • getStreamInitialBsonTimestamp

        public org.bson.BsonTimestamp getStreamInitialBsonTimestamp()
        Returns the initial start at operation time for a stream

        Note: This value will be ignored if the timestamp is negative or there is an existing offset present for the stream.

        Returns:
        the start at operation time for a stream
        Since:
        10.2
      • getMicroBatchMaxPartitionCount

        public int getMicroBatchMaxPartitionCount()
        Returns:
        the micro batch max partition count
      • outputExtendedJson

        public boolean outputExtendedJson()
        Returns:
        true if should ouput extended JSON
        Since:
        10.1
      • getOriginals

        public java.util.Map<java.lang.String,​java.lang.String> getOriginals()
        Specified by:
        getOriginals in interface MongoConfig
        Returns:
        the original options for this MongoConfig instance
      • getOptions

        public java.util.Map<java.lang.String,​java.lang.String> getOptions()
        Specified by:
        getOptions in interface MongoConfig
        Returns:
        the options for this MongoConfig instance
      • getDatabaseName

        public java.lang.String getDatabaseName()
        Specified by:
        getDatabaseName in interface MongoConfig
        Returns:
        the database name to use for this configuration
      • getCollectionName

        public java.lang.String getCollectionName()
        Specified by:
        getCollectionName in interface MongoConfig
        Returns:
        the collection name to use for this configuration
      • getMongoClient

        public com.mongodb.client.MongoClient getMongoClient()
        Returns a MongoClient

        Once the MongoClient is no longer required, it MUST be closed by calling mongoClient.close().

        Returns:
        the MongoClient from the cache or create a new one using the MongoClientFactory.
      • withClient

        public <T> T withClient​(java.util.function.Function<com.mongodb.client.MongoClient,​T> function)
        Runs a function against a MongoClient
        Type Parameters:
        T - The return type
        Parameters:
        function - the function that is passed the MongoClient
        Returns:
        the result of the function
      • doWithClient

        public void doWithClient​(java.util.function.Consumer<com.mongodb.client.MongoClient> consumer)
        Loans a MongoClient to the user, does not return a result.
        Parameters:
        consumer - the consumer of the MongoClient
      • withCollection

        public <T> T withCollection​(java.util.function.Function<com.mongodb.client.MongoCollection<org.bson.BsonDocument>,​T> function)
        Runs a function against a MongoCollection
        Type Parameters:
        T - The return type
        Parameters:
        function - the function that is passed the MongoCollection
        Returns:
        the result of the function
      • doWithCollection

        public void doWithCollection​(java.util.function.Consumer<com.mongodb.client.MongoCollection<org.bson.BsonDocument>> consumer)
        Loans a MongoCollection to the user, does not return a result.
        Parameters:
        consumer - the consumer of the MongoCollection<BsonDocument>
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • equals

        @TestOnly
        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object