Class GoogleHadoopFileSystemConfiguration


  • public class GoogleHadoopFileSystemConfiguration
    extends Object
    This class provides a configuration for the GoogleHadoopFileSystem implementations.
    • Field Detail

      • CONFIG_KEY_PREFIXES

        public static final List<String> CONFIG_KEY_PREFIXES
      • PERMISSIONS_TO_REPORT

        public static final HadoopConfigurationProperty<String> PERMISSIONS_TO_REPORT
        Key for the permissions that we report a file or directory to have. Can either be octal or symbolic mode accepted by FsPermission(String)

        Default value for the permissions that we report a file or directory to have. Note: We do not really support file/dir permissions, but we need to report some permission value when Hadoop calls getFileStatus(). A MapReduce job fails if we report permissions more relaxed than the value below and this is the default File System.

      • BLOCK_SIZE

        public static final HadoopConfigurationProperty<Long> BLOCK_SIZE
        Configuration key for default block size of a file.

        Note that this is the size that is reported to Hadoop FS clients. It does not modify the actual block size of an underlying GCS object, because GCS JSON API does not allow modifying or querying the value. Modifying this value allows one to control how many mappers are used to process a given file.

      • DELEGATION_TOKEN_BINDING_CLASS

        public static final HadoopConfigurationProperty<String> DELEGATION_TOKEN_BINDING_CLASS
        Configuration key for Delegation Token binding class. Default value: none
      • GCS_WORKING_DIRECTORY

        public static final HadoopConfigurationProperty<String> GCS_WORKING_DIRECTORY
        Configuration key for initial working directory of a GHFS instance. Default value: '/'
      • GCE_BUCKET_DELETE_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCE_BUCKET_DELETE_ENABLE
        If true, recursive delete on a path that refers to a GCS bucket itself ('/' for any bucket-rooted GoogleHadoopFileSystem) or delete on that path when it's empty will result in fully deleting the GCS bucket. If false, any operation that normally would have deleted the bucket will be ignored instead. Setting to 'false' preserves the typical behavior of "rm -rf /" which translates to deleting everything inside of root, but without clobbering the filesystem authority corresponding to that root path in the process.
      • GCS_REQUESTER_PAYS_PROJECT_ID

        public static final HadoopConfigurationProperty<String> GCS_REQUESTER_PAYS_PROJECT_ID
        Configuration key for GCS Requester Pays Project ID. Default value: none
      • GCS_PERFORMANCE_CACHE_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_PERFORMANCE_CACHE_ENABLE
        Configuration key for using a local item cache to supplement GCS API "getFile" results. This provides faster access to recently queried data. Because the data is cached, modifications made outside of this instance may not be immediately reflected. The performance cache can be used in conjunction with other caching options.
      • GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE

        public static final HadoopConfigurationProperty<Long> GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE
        Configuration key for maximum time a GoogleCloudStorageItemInfo will remain "valid" in the performance cache before it's invalidated.
      • GCS_STATUS_PARALLEL_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_STATUS_PARALLEL_ENABLE
        If true, executes GCS requests in listStatus and getFileStatus methods in parallel to reduce latency.
      • GCS_LAZY_INITIALIZATION_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_LAZY_INITIALIZATION_ENABLE
        Configuration key for enabling lazy initialization of GCS FS instance.
      • GCS_REPAIR_IMPLICIT_DIRECTORIES_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_REPAIR_IMPLICIT_DIRECTORIES_ENABLE
        Configuration key for enabling automatic repair of implicit directories whenever detected inside delete and rename calls.
      • GCS_CREATE_ITEMS_CONFLICT_CHECK_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_CREATE_ITEMS_CONFLICT_CHECK_ENABLE
        Configuration key for enabling check to ensure that conflicting directories do not exist when creating files and conflicting files do not exist when creating directories.
      • GCS_MARKER_FILE_PATTERN

        public static final HadoopConfigurationProperty<String> GCS_MARKER_FILE_PATTERN
        Configuration key for marker file pattern. Default value: none
      • GCS_MAX_REQUESTS_PER_BATCH

        public static final HadoopConfigurationProperty<Integer> GCS_MAX_REQUESTS_PER_BATCH
        Configuration key for a max number of GCS RPCs in batch request.
      • GCS_COPY_WITH_REWRITE_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_COPY_WITH_REWRITE_ENABLE
        Configuration key for enabling the use of Rewrite requests for copy operations. Rewrite request has the same effect as Copy request, but it can handle moving large objects that may potentially time out a Copy request.
      • GCS_REWRITE_MAX_CHUNK_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_REWRITE_MAX_CHUNK_SIZE
        Configuration key for specifying max number of bytes rewritten in a single rewrite request when fs.gs.copy.with.rewrite.enable is set to 'true'.
      • GCS_MAX_LIST_ITEMS_PER_CALL

        public static final HadoopConfigurationProperty<Integer> GCS_MAX_LIST_ITEMS_PER_CALL
        Configuration key for number of items to return per call to the list* GCS RPCs.
      • GCS_HTTP_MAX_RETRY

        public static final HadoopConfigurationProperty<Integer> GCS_HTTP_MAX_RETRY
        Configuration key for the max number of retries for failed HTTP request to GCS. Note that the connector will retry *up to* the number of times as specified, using a default ExponentialBackOff strategy.

        Also, note that this number will only control the number of retries in the low level HTTP request implementation.

      • GCS_HTTP_CONNECT_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_HTTP_CONNECT_TIMEOUT
        Configuration key for the connect timeout for HTTP request to GCS.
      • GCS_APPLICATION_NAME_SUFFIX

        public static final HadoopConfigurationProperty<String> GCS_APPLICATION_NAME_SUFFIX
        Configuration key for adding a suffix to the GHFS application name sent to GCS.
      • GCS_MAX_WAIT_TIME_EMPTY_OBJECT_CREATE

        public static final HadoopConfigurationProperty<Long> GCS_MAX_WAIT_TIME_EMPTY_OBJECT_CREATE
        Configuration key for modifying the maximum amount of time to wait for empty object creation.
      • GCS_OUTPUT_STREAM_BUFFER_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_OUTPUT_STREAM_BUFFER_SIZE
        Configuration key for setting write buffer size.
      • GCS_OUTPUT_STREAM_PIPE_BUFFER_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_OUTPUT_STREAM_PIPE_BUFFER_SIZE
        Configuration key for setting pipe buffer size.
      • GCS_OUTPUT_STREAM_UPLOAD_CHUNK_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_OUTPUT_STREAM_UPLOAD_CHUNK_SIZE
        Configuration key for setting GCS upload chunk size.
      • GCS_OUTPUT_STREAM_UPLOAD_CACHE_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_OUTPUT_STREAM_UPLOAD_CACHE_SIZE
        Configuration for setting GCS upload cache size.
      • GCS_OUTPUT_STREAM_DIRECT_UPLOAD_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_OUTPUT_STREAM_DIRECT_UPLOAD_ENABLE
        Configuration key for enabling GCS direct upload.
      • GCS_OUTPUT_STREAM_SYNC_MIN_INTERVAL

        public static final HadoopConfigurationProperty<Long> GCS_OUTPUT_STREAM_SYNC_MIN_INTERVAL
        Configuration key for the minimal time interval between consecutive sync/hsync/hflush calls.
      • GCS_INPUT_STREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_INPUT_STREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE
        If true, on opening a file we will proactively perform a metadata GET to check whether the object exists, even though the underlying channel will not open a data stream until read() is actually called. This is necessary to technically match the expected behavior of Hadoop filesystems, but incurs an extra latency overhead on open(). If the calling code can handle late failures on not-found errors, or has independently already ensured that a file exists before calling open(), then you can set this to false for more efficient reads.

        Note, this is known to not work with YARN CommonNodeLabelsManager and potentially other Hadoop components. That's why it's not recommended to set this property to false cluster-wide, instead set it for a specific job/application that is compatible with it.

      • GCS_INPUT_STREAM_SUPPORT_GZIP_ENCODING_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_INPUT_STREAM_SUPPORT_GZIP_ENCODING_ENABLE
        If false, reading a file with GZIP content encoding (HTTP header "Content-Encoding: gzip") will result in failure (IOException is thrown).
      • GCS_INPUT_STREAM_INPLACE_SEEK_LIMIT

        public static final HadoopConfigurationProperty<Long> GCS_INPUT_STREAM_INPLACE_SEEK_LIMIT
        If forward seeks are within this many bytes of the current position, seeks are performed by reading and discarding bytes in-place rather than opening a new underlying stream.
      • GCS_INPUT_STREAM_MIN_RANGE_REQUEST_SIZE

        public static final HadoopConfigurationProperty<Long> GCS_INPUT_STREAM_MIN_RANGE_REQUEST_SIZE
        Minimum size in bytes of the HTTP Range header set in GCS request when opening new stream to read an object.
      • GCS_GRPC_CHECKSUMS_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_GRPC_CHECKSUMS_ENABLE
        Configuration key for enabling checksum validation for the gRPC API.
      • GCS_GRPC_SERVER_ADDRESS

        public static final HadoopConfigurationProperty<String> GCS_GRPC_SERVER_ADDRESS
        Configuration key for the Cloud Storage gRPC server address.
      • GCS_GRPC_CHECK_INTERVAL_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_GRPC_CHECK_INTERVAL_TIMEOUT
        Configuration key for check interval for gRPC request timeout to GCS.
      • GCS_GRPC_READ_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_GRPC_READ_TIMEOUT
        Configuration key for the connection timeout for gRPC read requests to GCS.
      • GCS_GRPC_READ_MESSAGE_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_GRPC_READ_MESSAGE_TIMEOUT
        Configuration key for the message timeout for gRPC read requests to GCS.
      • GCS_GRPC_READ_ZEROCOPY_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_GRPC_READ_ZEROCOPY_ENABLE
        Configuration key for enabling the zero-copy deserializer for the gRPC API.
      • GCS_GRPC_UPLOAD_BUFFERED_REQUESTS

        public static final HadoopConfigurationProperty<Integer> GCS_GRPC_UPLOAD_BUFFERED_REQUESTS
        Configuration key for the number of requests to be buffered for uploads to GCS.
      • GCS_GRPC_WRITE_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_GRPC_WRITE_TIMEOUT
        Configuration key for the connect timeout for gRPC write requests to GCS.
      • GCS_GRPC_WRITE_MESSAGE_TIMEOUT

        public static final HadoopConfigurationProperty<Long> GCS_GRPC_WRITE_MESSAGE_TIMEOUT
        Configuration key for the message timeout for gRPC write requests to GCS.
      • GCS_GRPC_DIRECTPATH_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_GRPC_DIRECTPATH_ENABLE
        Configuration key for enabling use of directpath gRPC API for read/write.
      • GCS_GRPC_TRAFFICDIRECTOR_ENABLE

        public static final HadoopConfigurationProperty<Boolean> GCS_GRPC_TRAFFICDIRECTOR_ENABLE
        Configuration key for enabling use of traffic director gRPC API for read/write.
    • Constructor Detail

      • GoogleHadoopFileSystemConfiguration

        public GoogleHadoopFileSystemConfiguration()