public abstract class GoogleHadoopFileSystemBase extends org.apache.hadoop.fs.FileSystem implements FileSystemDescriptor
It is implemented as a thin abstraction layer on top of GCS. The layer hides any specific characteristics of the underlying store and exposes FileSystem interface understood by the Hadoop engine.
Users interact with the files in the storage using fully qualified URIs. The file system
exposed by this class is identified using the 'gs' scheme. For example, gs://dir1/dir2/file1.txt
.
This implementation translates paths between hadoop Path and GCS URI with the convention that the Hadoop root directly corresponds to the GCS "root", e.g. gs:/. This is convenient for many reasons, such as data portability and close equivalence to gsutil paths, but imposes certain inherited constraints, such as files not being allowed in root (only 'directories' can be placed in root), and directory names inside root have a more limited set of allowed characters.
One of the main goals of this implementation is to maintain compatibility with behavior of HDFS implementation when accessed through FileSystem interface. HDFS implementation is not very consistent about the cases when it throws versus the cases when methods return false. We run GHFS tests and HDFS tests against the same test data and use that as a guide to decide whether to throw or to return false.
Modifier and Type | Class and Description |
---|---|
static class |
GoogleHadoopFileSystemBase.Counter
Defines names of counters we track for each operation.
|
protected static class |
GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior
Behavior of listStatus when a path is not found.
|
static class |
GoogleHadoopFileSystemBase.OutputStreamType
Available types for use with
GCS_OUTPUTSTREAM_TYPE_KEY . |
static class |
GoogleHadoopFileSystemBase.ParentTimestampUpdateIncludePredicate
A predicate that processes individual directory paths and evaluates the conditions set in
fs.gs.parent.timestamp.update.enable, fs.gs.parent.timestamp.update.substrings.include and
fs.gs.parent.timestamp.update.substrings.exclude to determine if a path should be ignored
when running directory timestamp updates.
|
Modifier and Type | Field and Description |
---|---|
static String |
AUTHENTICATION_PREFIX
Prefix to use for common authentication keys.
|
static int |
BLOCK_SIZE_DEFAULT
Default value of
BLOCK_SIZE_KEY . |
static String |
BLOCK_SIZE_KEY
Configuration key for default block size of a file.
|
static int |
BUFFERSIZE_DEFAULT
Hadoop passes 4096 bytes as buffer size which causes poor perf.
|
static String |
BUFFERSIZE_KEY
Configuration key for setting IO buffer size.
|
protected com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> |
counters
Map of counter values
|
static org.apache.hadoop.fs.PathFilter |
DEFAULT_FILTER
Default PathFilter that accepts all paths.
|
protected long |
defaultBlockSize
Default block size.
|
static String |
ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
Configuration key for enabling GCE service account authentication.
|
static boolean |
GCE_BUCKET_DELETE_ENABLE_DEFAULT
Default value for
GCE_BUCKET_DELETE_ENABLE_KEY . |
static String |
GCE_BUCKET_DELETE_ENABLE_KEY
If true, recursive delete on a path that refers to a GCS bucket itself ('/' for any
bucket-rooted GoogleHadoopFileSystem) or delete on that path when it's empty will result in
fully deleting the GCS bucket.
|
static String |
GCS_APPLICATION_NAME_SUFFIX_DEFAULT
Default suffix to add to the application name.
|
static String |
GCS_APPLICATION_NAME_SUFFIX_KEY
Configuration key for adding a suffix to the GHFS application name sent to GCS.
|
static String |
GCS_CLIENT_ID_KEY
Configuration key for GCS client ID.
|
static String |
GCS_CLIENT_SECRET_KEY
Configuration key for GCS client secret.
|
static boolean |
GCS_CREATE_SYSTEM_BUCKET_DEFAULT
Default value of
GCS_CREATE_SYSTEM_BUCKET_KEY . |
static String |
GCS_CREATE_SYSTEM_BUCKET_KEY
Configuration key for flag to indicate whether system bucket should be created if it does not
exist.
|
static boolean |
GCS_ENABLE_FLAT_GLOB_DEFAULT
Default value for
GCS_ENABLE_FLAT_GLOB_KEY . |
static String |
GCS_ENABLE_FLAT_GLOB_KEY
Configuration key for enabling the use of a large flat listing to pre-populate possible glob
matches in a single API call before running the core globbing logic in-memory rather than
sequentially and recursively performing API calls.
|
static boolean |
GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_DEFAULT
Default value for
GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_KEY . |
static String |
GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_KEY
Configuration key for enabling automatic inference of implicit directories.
|
static boolean |
GCS_ENABLE_MARKER_FILE_CREATION_DEFAULT
Default value for
GCS_ENABLE_MARKER_FILE_CREATION_KEY . |
static String |
GCS_ENABLE_MARKER_FILE_CREATION_KEY
Configuration key for enabling the use of marker files during file creation.
|
static boolean |
GCS_ENABLE_METADATA_CACHE_DEFAULT
Default value for
GCS_ENABLE_METADATA_CACHE_KEY . |
static String |
GCS_ENABLE_METADATA_CACHE_KEY
Configuration key for using a local metadata cache to supplement GCS API "list" results; this
allows same-client create() to immediately be visible to a subsequent list() call.
|
static boolean |
GCS_ENABLE_PERFORMANCE_CACHE_DEFAULT
Default value for
GCS_ENABLE_PERFORMANCE_CACHE_KEY . |
static String |
GCS_ENABLE_PERFORMANCE_CACHE_KEY
Configuration key for using a local item cache to supplement GCS API "getFile" results.
|
static boolean |
GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_DEFAULT
Default value for
GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY . |
static String |
GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY
Configuration key for enabling automatic repair of implicit directories whenever detected
inside listStatus and globStatus calls, or other methods which may indirectly call listStatus
and/or globaStatus.
|
static String |
GCS_FILE_SIZE_LIMIT_250GB
Configuration key for setting 250GB upper limit on file size to gain higher write throughput.
|
static boolean |
GCS_FILE_SIZE_LIMIT_250GB_DEFAULT
Default value of
GCS_FILE_SIZE_LIMIT_250GB . |
static String |
GCS_HTTP_TRANSPORT_DEFAULT
Default to the default specified in HttpTransportFactory.
|
static String |
GCS_HTTP_TRANSPORT_KEY
Configuration key for the name of HttpTransport class to use for connecting to GCS.
|
static boolean |
GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_DEFAULT
Default value for
GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_KEY . |
static String |
GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_KEY
If true, on opening a file we will proactively perform a metadata GET to check whether the
object exists, even though the underlying channel will not open a data stream until read() is
actually called so that streams can seek to nonzero file positions without incurring an extra
stream creation.
|
static long |
GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_DEFAULT
Default value for
GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_KEY . |
static String |
GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_KEY
If forward seeks are within this many bytes of the current position, seeks are performed by
reading and discarding bytes in-place rather than opening a new underlying stream.
|
static boolean |
GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_DEFAULT
Default value for
GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_KEY . |
static String |
GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_KEY
If true, the returned FSDataInputStream from the open(Path) method will hold an internal
ByteBuffer of size fs.gs.io.buffersize which it pre-fills on each read, and can efficiently
seek within the internal buffer.
|
static boolean |
GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_DEFAULT
Default value for
GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_KEY . |
static String |
GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_KEY
If true, input streams will proactively check the "content-encoding" header of underlying
objects during reads for special handling of cases where content-encoding causes the reported
object sizes to not match the actual number of read bytes due to the content being decoded
in-transit; such encoded objects also aren't suitable for splitting or resuming on failure, so
the underlying channel will restart from byte 0 and discard the requisite number of bytes to
seek to a desired position or resume in such cases.
|
static String |
GCS_MARKER_FILE_PATTERN_KEY
Configuration key for marker file pattern.
|
static String |
GCS_MAX_LIST_ITEMS_PER_CALL
Configuration key for number of items to return per call to the list* GCS RPCs.
|
static long |
GCS_MAX_LIST_ITEMS_PER_CALL_DEFAULT
Default value for
GCS_MAX_LIST_ITEMS_PER_CALL . |
static int |
GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_DEFAULT
Default to 3 seconds.
|
static String |
GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_KEY
Configuration key for modifying the maximum amount of time to wait for empty object creation.
|
static String |
GCS_METADATA_CACHE_DIRECTORY_DEFAULT
Default value for
GCS_METADATA_CACHE_DIRECTORY_KEY . |
static String |
GCS_METADATA_CACHE_DIRECTORY_KEY
Only used if fs.gs.metadata.cache.type is FILESYSTEM_BACKED, specifies the local path to use as
the base path for storing mirrored GCS metadata.
|
static long |
GCS_METADATA_CACHE_MAX_ENTRY_AGE_DEFAULT
Default value for
GCS_METADATA_CACHE_MAX_ENTRY_AGE_KEY . |
static String |
GCS_METADATA_CACHE_MAX_ENTRY_AGE_KEY
Maximum number of milliseconds a cache entry will remain in the list-consistency cache, even as
an id-only entry (no risk of stale GoogleCloudStorageItemInfo).
|
static long |
GCS_METADATA_CACHE_MAX_INFO_AGE_DEFAULT
Default value for
GCS_METADATA_CACHE_MAX_INFO_AGE_KEY . |
static String |
GCS_METADATA_CACHE_MAX_INFO_AGE_KEY
Maximum number of milliseconds a GoogleCloudStorageItemInfo will remain "valid" in the
list-consistency cache, after which the next attempt to fetch the itemInfo will require
fetching fresh info from a GoogleCloudStorage instance.
|
static String |
GCS_METADATA_CACHE_TYPE_DEFAULT
Default value for
GCS_METADATA_CACHE_TYPE_KEY . |
static String |
GCS_METADATA_CACHE_TYPE_KEY
Configuration key for specifying which implementation of DirectoryListCache to use for
supplementing GCS API "list" results.
|
static String |
GCS_OUTPUTSTREAM_TYPE_DEFAULT
Default value for
GCS_OUTPUTSTREAM_TYPE_KEY . |
static String |
GCS_OUTPUTSTREAM_TYPE_KEY
Configuration key for which type of output stream to use; different options may have different
degrees of support for advanced features like hsync() and different performance
characteristics.
|
static boolean |
GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_DEFAULT
Default value for
GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY . |
static String |
GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY
Configuration key for whether or not we should update timestamps for parent directories when we
create new files in them.
|
static String |
GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_DEFAULT
Default value for
GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY . |
static String |
GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY
Configuration key containing a comma-separated list of sub-strings that when matched will cause
a particular directory to not have its modification timestamp updated.
|
static String |
GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_DEFAULT
Default value for
GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY . |
static String |
GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY
Configuration key containing a comma-separated list of sub-strings that when matched will cause
a particular directory to have its modification timestamp updated.
|
static boolean |
GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_DEFAULT
Default value for
GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_KEY . |
static String |
GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_KEY
Configuration key for whether or not to enable list caching for the performance cache.
|
static long |
GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_DEFAULT
Default value for
GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_KEY . |
static String |
GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_KEY
Configuration key for maximum number of milliseconds a GoogleCloudStorageItemInfo will remain
"valid" in the performance cache before it's invalidated.
|
static String |
GCS_PROJECT_ID_KEY
Configuration key for GCS project ID.
|
static String |
GCS_PROXY_ADDRESS_DEFAULT
Default to no proxy.
|
static String |
GCS_PROXY_ADDRESS_KEY
Configuration key for setting a proxy for the connector to use to connect to GCS.
|
static String |
GCS_REQUESTER_PAYS_BUCKETS_KEY
Configuration key for GCS Requester Pays Buckets.
|
static String |
GCS_REQUESTER_PAYS_MODE_KEY
Configuration key for GCS project ID.
|
static String |
GCS_REQUESTER_PAYS_PROJECT_ID_KEY
Configuration key for GCS Requester Pays Project ID.
|
static String |
GCS_SYSTEM_BUCKET_KEY
Configuration key for system bucket name.
|
static String |
GCS_WORKING_DIRECTORY_KEY
Configuration key for initial working directory of a GHFS instance.
|
protected GoogleCloudStorageFileSystem |
gcsfs
Underlying GCS file system object.
|
static String |
GHFS_ID
Identifies this version of the GoogleHadoopFileSystemBase library.
|
protected URI |
initUri
The URI the File System is passed in initialize.
|
protected GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior |
listStatusFileNotFoundBehavior |
static org.slf4j.Logger |
LOG
Logger.
|
static String |
MR_JOB_HISTORY_DONE_DIR_KEY
Configuration key of the MR done directory.
|
static String |
MR_JOB_HISTORY_INTERMEDIATE_DONE_DIR_KEY
Configuration key for the MR intermediate done dir.
|
static String |
PATH_CODEC_DEFAULT
Use the default path codec.
|
static String |
PATH_CODEC_KEY
Configuration key for changing the path codec from legacy to 'uri path encoding'.
|
static String |
PATH_CODEC_USE_LEGACY_ENCODING
Use LEGACY_PATH_CODEC.
|
static String |
PATH_CODEC_USE_URI_ENCODING
Use new URI_ENCODED_PATH_CODEC.
|
static String |
PERMISSIONS_TO_REPORT_DEFAULT
Default value for the permissions that we report a file or directory to have.
|
static String |
PERMISSIONS_TO_REPORT_KEY
Key for the permissions that we report a file or directory to have.
|
static String |
PROPERTIES_FILE
A resource file containing GCS related build properties.
|
static short |
REPLICATION_FACTOR_DEFAULT
Default value of replication factor.
|
static String |
SERVICE_ACCOUNT_AUTH_EMAIL_KEY
Configuration key specifying the email address of the service-account with which to
authenticate.
|
static String |
SERVICE_ACCOUNT_AUTH_KEYFILE_KEY
Configuration key specifying local file containing a service-account private .p12 keyfile.
|
protected String |
systemBucket
Deprecated.
|
static String |
UNKNOWN_VERSION
The version returned when one cannot be found in properties.
|
static String |
VERSION
Current version.
|
static String |
VERSION_PROPERTY
The key in the PROPERTIES_FILE that contains the version built.
|
static int |
WRITE_BUFFERSIZE_DEFAULT
Default value of
WRITE_BUFFERSIZE_KEY . |
static String |
WRITE_BUFFERSIZE_KEY
Configuration key for setting write buffer size.
|
Constructor and Description |
---|
GoogleHadoopFileSystemBase()
Constructs an instance of GoogleHadoopFileSystemBase; the internal
GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called.
|
GoogleHadoopFileSystemBase(GoogleCloudStorageFileSystem gcsfs)
Constructs an instance of GoogleHadoopFileSystemBase using the provided
GoogleCloudStorageFileSystem; initialize() will not re-initialize it.
|
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.fs.FSDataOutputStream |
append(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize,
org.apache.hadoop.util.Progressable progress)
Appends to an existing file (optional operation).
|
protected void |
checkPath(org.apache.hadoop.fs.Path path) |
void |
close() |
void |
completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
void |
configureBuckets(String systemBucketName,
boolean createSystemBucket)
Validates and possibly creates the system bucket.
|
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path[] srcs,
org.apache.hadoop.fs.Path dst) |
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
void |
copyToLocalFile(boolean delSrc,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
org.apache.hadoop.fs.FSDataOutputStream |
create(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
boolean overwrite,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
Opens the given file for writing.
|
protected com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> |
createCounterMap() |
boolean |
delete(org.apache.hadoop.fs.Path f)
Deprecated.
Use
delete(Path, boolean) instead |
boolean |
delete(org.apache.hadoop.fs.Path hadoopPath,
boolean recursive)
Deletes the given file or directory.
|
boolean |
deleteOnExit(org.apache.hadoop.fs.Path f) |
String |
getCanonicalServiceName() |
org.apache.hadoop.fs.ContentSummary |
getContentSummary(org.apache.hadoop.fs.Path f) |
long |
getDefaultBlockSize() |
protected int |
getDefaultPort()
The default port is listed as -1 as an indication that ports are not used.
|
short |
getDefaultReplication()
Gets the default replication factor.
|
abstract org.apache.hadoop.fs.Path |
getDefaultWorkingDirectory()
Gets the default value of working directory.
|
org.apache.hadoop.security.token.Token<?> |
getDelegationToken(String renewer) |
org.apache.hadoop.fs.FileChecksum |
getFileChecksum(org.apache.hadoop.fs.Path f) |
org.apache.hadoop.fs.FileStatus |
getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
Gets status of the given path item.
|
abstract org.apache.hadoop.fs.Path |
getFileSystemRoot()
Returns the Hadoop path representing the root of the FileSystem associated with this
FileSystemDescriptor.
|
GoogleCloudStorageFileSystem |
getGcsFs()
Gets GCS FS instance.
|
abstract URI |
getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
Gets GCS path corresponding to the given Hadoop path, which can be relative or absolute,
and can have either gs://
|
abstract org.apache.hadoop.fs.Path |
getHadoopPath(URI gcsPath)
Gets Hadoop path corresponding to the given GCS path.
|
String |
getHadoopScheme()
Deprecated.
|
org.apache.hadoop.fs.Path |
getHomeDirectory()
Returns home directory of the current user.
|
protected abstract String |
getHomeDirectorySubpath()
Returns an unqualified path without any leading slash, relative to the filesystem root,
which serves as the home directory of the current user; see
getHomeDirectory for
a description of what the home directory means. |
abstract String |
getScheme()
Returns the URI scheme for the Hadoop FileSystem associated with this FileSystemDescriptor.
|
URI |
getUri()
Returns a URI of the root of this FileSystem.
|
long |
getUsed() |
org.apache.hadoop.fs.Path |
getWorkingDirectory()
Gets the current working directory.
|
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern)
Returns an array of FileStatus objects whose path names match pathPattern.
|
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern,
org.apache.hadoop.fs.PathFilter filter)
Returns an array of FileStatus objects whose path names match pathPattern
and is accepted by the user-supplied path filter.
|
void |
initialize(URI path,
org.apache.hadoop.conf.Configuration config)
See
initialize(URI, Configuration, boolean) for details; calls with third arg
defaulting to 'true' for initializing the superclass. |
void |
initialize(URI path,
org.apache.hadoop.conf.Configuration config,
boolean initSuperclass)
Initializes this file system instance.
|
org.apache.hadoop.fs.FileStatus[] |
listStatus(org.apache.hadoop.fs.Path hadoopPath)
Lists file status.
|
org.apache.hadoop.fs.Path |
makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root it's own parent.
|
boolean |
mkdirs(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission)
Makes the given path and all non-existent parents directories.
|
org.apache.hadoop.fs.FSDataInputStream |
open(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize)
Opens the given file for reading.
|
protected void |
processDeleteOnExit() |
boolean |
rename(org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
Renames src to dst.
|
protected void |
setListStatusFileNotFoundBehavior(GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior behavior) |
void |
setOwner(org.apache.hadoop.fs.Path p,
String username,
String groupname) |
void |
setPermission(org.apache.hadoop.fs.Path p,
org.apache.hadoop.fs.permission.FsPermission permission) |
void |
setTimes(org.apache.hadoop.fs.Path p,
long mtime,
long atime) |
void |
setVerifyChecksum(boolean verifyChecksum) |
void |
setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
Sets the current working directory to the given path.
|
org.apache.hadoop.fs.Path |
startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
access, addDelegationTokens, append, append, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getAllStoragePolicies, getBlockSize, getCanonicalUri, getChildFileSystems, getDefaultBlockSize, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getQuotaUsage, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getStoragePolicy, getStorageStatistics, getTrashRoot, getTrashRoots, getUsed, getXAttr, getXAttrs, getXAttrs, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusIterator, listXAttrs, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setReplication, setStoragePolicy, setWriteChecksum, setXAttr, setXAttr, supportsSymlinks, truncate, unsetStoragePolicy
public static final org.slf4j.Logger LOG
public static final short REPLICATION_FACTOR_DEFAULT
public static final String PERMISSIONS_TO_REPORT_KEY
FsPermission#FromString(String)
public static final String PERMISSIONS_TO_REPORT_DEFAULT
public static final String BUFFERSIZE_KEY
public static final int BUFFERSIZE_DEFAULT
BUFFERSIZE_KEY
.public static final String WRITE_BUFFERSIZE_KEY
public static final int WRITE_BUFFERSIZE_DEFAULT
WRITE_BUFFERSIZE_KEY
.public static final String BLOCK_SIZE_KEY
public static final int BLOCK_SIZE_DEFAULT
BLOCK_SIZE_KEY
.public static final String AUTHENTICATION_PREFIX
public static final String ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
HadoopCredentialConfiguration
for current key names.public static final String SERVICE_ACCOUNT_AUTH_EMAIL_KEY
ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
is true AND we're using
fs.gs.service.account.auth.keyfile to authenticate with a private keyfile. NB: Once GCE
supports setting multiple service account email addresses for metadata auth, this key will also
be used in the metadata auth flow. This key is deprecated. See HadoopCredentialConfiguration
for current key names.public static final String SERVICE_ACCOUNT_AUTH_KEYFILE_KEY
ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
is true; if
provided, the keyfile will be used for service-account authentication. Otherwise, it is assumed
that we are on a GCE VM with metadata-authentication for service-accounts enabled, and the
metadata server will be used instead. Default value: none This key is deprecated. See HadoopCredentialConfiguration
for current key names.public static final String GCS_PROJECT_ID_KEY
public static final String GCS_REQUESTER_PAYS_MODE_KEY
public static final String GCS_REQUESTER_PAYS_PROJECT_ID_KEY
public static final String GCS_REQUESTER_PAYS_BUCKETS_KEY
public static final String GCS_CLIENT_ID_KEY
ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
== false. Default value: none
This key is deprecated. See HadoopCredentialConfiguration
for current key names.public static final String GCS_CLIENT_SECRET_KEY
ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
== false. Default value: none
This key is deprecated. See HadoopCredentialConfiguration for current key names.public static final String GCS_SYSTEM_BUCKET_KEY
public static final String GCS_CREATE_SYSTEM_BUCKET_KEY
GCS_SYSTEM_BUCKET_KEY
.public static final boolean GCS_CREATE_SYSTEM_BUCKET_DEFAULT
GCS_CREATE_SYSTEM_BUCKET_KEY
.public static final String GCS_WORKING_DIRECTORY_KEY
public static final String GCS_FILE_SIZE_LIMIT_250GB
public static final boolean GCS_FILE_SIZE_LIMIT_250GB_DEFAULT
GCS_FILE_SIZE_LIMIT_250GB
.public static final String GCS_MARKER_FILE_PATTERN_KEY
public static final String GCS_ENABLE_METADATA_CACHE_KEY
public static final boolean GCS_ENABLE_METADATA_CACHE_DEFAULT
GCS_ENABLE_METADATA_CACHE_KEY
.public static final String GCS_ENABLE_PERFORMANCE_CACHE_KEY
public static final boolean GCS_ENABLE_PERFORMANCE_CACHE_DEFAULT
GCS_ENABLE_PERFORMANCE_CACHE_KEY
.public static final String GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_KEY
public static final long GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_DEFAULT
GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_KEY
.public static final String GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_KEY
public static final boolean GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_DEFAULT
GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_KEY
.public static final String GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY
public static final boolean GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_DEFAULT
GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY
.public static final String GCS_METADATA_CACHE_TYPE_KEY
IN_MEMORY: Enforces immediate consistency within same Java process.
FILESYSTEM_BACKED: Enforces consistency across all cooperating processes pointed at the same local mirror directory, which may be an NFS directory for distributed coordination.
public static final String GCS_METADATA_CACHE_TYPE_DEFAULT
GCS_METADATA_CACHE_TYPE_KEY
.public static final String GCS_METADATA_CACHE_DIRECTORY_KEY
public static final String GCS_METADATA_CACHE_DIRECTORY_DEFAULT
GCS_METADATA_CACHE_DIRECTORY_KEY
.public static final String GCS_METADATA_CACHE_MAX_ENTRY_AGE_KEY
public static final long GCS_METADATA_CACHE_MAX_ENTRY_AGE_DEFAULT
GCS_METADATA_CACHE_MAX_ENTRY_AGE_KEY
.public static final String GCS_METADATA_CACHE_MAX_INFO_AGE_KEY
public static final long GCS_METADATA_CACHE_MAX_INFO_AGE_DEFAULT
GCS_METADATA_CACHE_MAX_INFO_AGE_KEY
.public static final String GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_DEFAULT
GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY
.public static final String MR_JOB_HISTORY_INTERMEDIATE_DONE_DIR_KEY
public static final String MR_JOB_HISTORY_DONE_DIR_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_DEFAULT
GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY
.public static final String GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY
public static final boolean GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_DEFAULT
GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY
.public static final String PATH_CODEC_KEY
public static final String PATH_CODEC_USE_URI_ENCODING
public static final String PATH_CODEC_USE_LEGACY_ENCODING
public static final String PATH_CODEC_DEFAULT
public static final String GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_KEY
public static final boolean GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_DEFAULT
GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_KEY
.public static final String GCS_ENABLE_FLAT_GLOB_KEY
public static final boolean GCS_ENABLE_FLAT_GLOB_DEFAULT
GCS_ENABLE_FLAT_GLOB_KEY
.public static final String GCS_ENABLE_MARKER_FILE_CREATION_KEY
public static final boolean GCS_ENABLE_MARKER_FILE_CREATION_DEFAULT
GCS_ENABLE_MARKER_FILE_CREATION_KEY
.public static final String GCS_MAX_LIST_ITEMS_PER_CALL
public static final long GCS_MAX_LIST_ITEMS_PER_CALL_DEFAULT
GCS_MAX_LIST_ITEMS_PER_CALL
.public static final String GCS_PROXY_ADDRESS_KEY
public static final String GCS_PROXY_ADDRESS_DEFAULT
public static final String GCS_HTTP_TRANSPORT_KEY
public static final String GCS_HTTP_TRANSPORT_DEFAULT
public static final String GCS_APPLICATION_NAME_SUFFIX_KEY
public static final String GCS_APPLICATION_NAME_SUFFIX_DEFAULT
public static final String GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_KEY
public static final int GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_DEFAULT
public static final String GCS_OUTPUTSTREAM_TYPE_KEY
BASIC: Stream is closest analogue to direct wrapper around low-level HTTP stream into GCS.
SYNCABLE_COMPOSITE: Stream behaves similarly to BASIC when used with basic create/write/close patterns, but supports hsync() by creating discrete temporary GCS objects which are composed onto the destination object. Has a hard upper limit of number of components which can be composed onto the destination object.
public static final String GCS_OUTPUTSTREAM_TYPE_DEFAULT
GCS_OUTPUTSTREAM_TYPE_KEY
.public static final String GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_KEY
BUFFERSIZE_KEY
is passed
through for the lower-level channel to interpret as it sees fit.public static final boolean GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_DEFAULT
GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_KEY
.public static final String GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_KEY
public static final boolean GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_DEFAULT
GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_KEY
.public static final String GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_KEY
public static final boolean GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_DEFAULT
GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_KEY
.public static final String GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_KEY
public static final long GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_DEFAULT
GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_KEY
.public static final String GCE_BUCKET_DELETE_ENABLE_KEY
public static final boolean GCE_BUCKET_DELETE_ENABLE_DEFAULT
GCE_BUCKET_DELETE_ENABLE_KEY
.public static final org.apache.hadoop.fs.PathFilter DEFAULT_FILTER
public static final String PROPERTIES_FILE
public static final String VERSION_PROPERTY
public static final String UNKNOWN_VERSION
public static final String VERSION
public static final String GHFS_ID
protected URI initUri
@Deprecated protected String systemBucket
GCS_SYSTEM_BUCKET_KEY
.
Used as a fallback for a root bucket, when required.protected GoogleCloudStorageFileSystem gcsfs
protected long defaultBlockSize
protected final com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> counters
protected GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior listStatusFileNotFoundBehavior
public GoogleHadoopFileSystemBase()
public GoogleHadoopFileSystemBase(GoogleCloudStorageFileSystem gcsfs)
protected com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> createCounterMap()
protected void setListStatusFileNotFoundBehavior(GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior behavior)
protected abstract String getHomeDirectorySubpath()
getHomeDirectory
for
a description of what the home directory means.public abstract org.apache.hadoop.fs.Path getHadoopPath(URI gcsPath)
gcsPath
- Fully-qualified GCS path, of the form gs://public abstract URI getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
hadoopPath
- Hadoop path.public abstract org.apache.hadoop.fs.Path getDefaultWorkingDirectory()
public abstract org.apache.hadoop.fs.Path getFileSystemRoot()
FileSystemDescriptor
getFileSystemRoot
in interface FileSystemDescriptor
public abstract String getScheme()
FileSystemDescriptor
getScheme
in interface FileSystemDescriptor
getScheme
in class org.apache.hadoop.fs.FileSystem
@Deprecated public String getHadoopScheme()
FileSystemDescriptor
getHadoopScheme
in interface FileSystemDescriptor
public org.apache.hadoop.fs.Path makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root it's own parent. This is POSIX compliant, but more importantly guards against poor directory accounting in the PathData class of Hadoop 2's FsShell.
makeQualified
in class org.apache.hadoop.fs.FileSystem
protected void checkPath(org.apache.hadoop.fs.Path path)
checkPath
in class org.apache.hadoop.fs.FileSystem
public void initialize(URI path, org.apache.hadoop.conf.Configuration config) throws IOException
initialize(URI, Configuration, boolean)
for details; calls with third arg
defaulting to 'true' for initializing the superclass.initialize
in class org.apache.hadoop.fs.FileSystem
path
- URI of a file/directory within this file system.config
- Hadoop configuration.IOException
public void initialize(URI path, org.apache.hadoop.conf.Configuration config, boolean initSuperclass) throws IOException
path
- URI of a file/directory within this file system.config
- Hadoop configuration.initSuperclass
- if false, doesn't call super.initialize(path, config); avoids
registering a global Statistics object for this instance.IOException
public URI getUri()
getUri
in class org.apache.hadoop.fs.FileSystem
protected int getDefaultPort()
getDefaultPort
in class org.apache.hadoop.fs.FileSystem
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path hadoopPath, int bufferSize) throws IOException
Note: This function overrides the given bufferSize value with a higher number unless further overridden using configuration parameter fs.gs.io.buffersize.
open
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- File to open.bufferSize
- Size of buffer to use for IO.FileNotFoundException
- if the given path does not exist.IOException
- if an error occurs.public org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress) throws IOException
create
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- The file to open.permission
- Permissions to set on the new file. Ignored.overwrite
- If a file with this name already exists, then if true,
the file will be overwritten, and if false an error will be thrown.bufferSize
- The size of the buffer to use.replication
- Required block replication for the file. Ignored.blockSize
- The block-size to be used for the new file. Ignored.progress
- Progress is reported through this. Ignored.IOException
- if an error occurs.setPermission(Path, FsPermission)
public org.apache.hadoop.fs.FSDataOutputStream append(org.apache.hadoop.fs.Path hadoopPath, int bufferSize, org.apache.hadoop.util.Progressable progress) throws IOException
append
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- The existing file to be appended.bufferSize
- The size of the buffer to be used.progress
- For reporting progress if it is not null.IOException
- if an error occurs.public boolean rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
rename
in class org.apache.hadoop.fs.FileSystem
src
- Source path.dst
- Destination path.FileNotFoundException
- if src does not exist.IOException
- if an error occurs.@Deprecated public boolean delete(org.apache.hadoop.fs.Path f) throws IOException
delete(Path, boolean)
insteaddelete
in class org.apache.hadoop.fs.FileSystem
IOException
public boolean delete(org.apache.hadoop.fs.Path hadoopPath, boolean recursive) throws IOException
delete
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- The path to delete.recursive
- If path is a directory and set to
true, the directory is deleted, else throws an exception.
In case of a file, the recursive parameter is ignored.IOException
- if an error occurs.public org.apache.hadoop.fs.FileStatus[] listStatus(org.apache.hadoop.fs.Path hadoopPath) throws IOException
listStatus
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- Given path.IOException
- if an error occurs.public void setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
setWorkingDirectory
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- New working directory.public org.apache.hadoop.fs.Path getWorkingDirectory()
getWorkingDirectory
in class org.apache.hadoop.fs.FileSystem
public boolean mkdirs(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission) throws IOException
mkdirs
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- Given path.permission
- Permissions to set on the given directory.IOException
- if an error occurs.public short getDefaultReplication()
getDefaultReplication
in class org.apache.hadoop.fs.FileSystem
public org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.Path hadoopPath) throws IOException
getFileStatus
in class org.apache.hadoop.fs.FileSystem
hadoopPath
- The path we want information about.FileNotFoundException
- when the path does not exist;IOException
- on other errors.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern) throws IOException
globStatus
in class org.apache.hadoop.fs.FileSystem
pathPattern
- A regular expression specifying the path pattern.IOException
- if an error occurs.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern, org.apache.hadoop.fs.PathFilter filter) throws IOException
globStatus
in class org.apache.hadoop.fs.FileSystem
pathPattern
- A regular expression specifying the path pattern.filter
- A user-supplied path filter.IOException
- if an error occurs.public org.apache.hadoop.fs.Path getHomeDirectory()
getHomeDirectory
in class org.apache.hadoop.fs.FileSystem
public String getCanonicalServiceName()
Returns null, because GHFS does not use security tokens.
getCanonicalServiceName
in class org.apache.hadoop.fs.FileSystem
public GoogleCloudStorageFileSystem getGcsFs()
public void configureBuckets(String systemBucketName, boolean createSystemBucket) throws IOException
systemBucketName
- Name of system bucketcreateSystemBucket
- Whether or not to create systemBucketName if it does not exist.IOException
- if systemBucketName is invalid or cannot be found and createSystemBucket
is false.public boolean deleteOnExit(org.apache.hadoop.fs.Path f) throws IOException
deleteOnExit
in class org.apache.hadoop.fs.FileSystem
IOException
protected void processDeleteOnExit()
processDeleteOnExit
in class org.apache.hadoop.fs.FileSystem
public org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path f) throws IOException
getContentSummary
in class org.apache.hadoop.fs.FileSystem
IOException
public org.apache.hadoop.security.token.Token<?> getDelegationToken(String renewer) throws IOException
getDelegationToken
in class org.apache.hadoop.fs.FileSystem
IOException
public void copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path[] srcs, org.apache.hadoop.fs.Path dst) throws IOException
copyFromLocalFile
in class org.apache.hadoop.fs.FileSystem
IOException
public void copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
copyFromLocalFile
in class org.apache.hadoop.fs.FileSystem
IOException
public void copyToLocalFile(boolean delSrc, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
copyToLocalFile
in class org.apache.hadoop.fs.FileSystem
IOException
public org.apache.hadoop.fs.Path startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile) throws IOException
startLocalOutput
in class org.apache.hadoop.fs.FileSystem
IOException
public void completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile) throws IOException
completeLocalOutput
in class org.apache.hadoop.fs.FileSystem
IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class org.apache.hadoop.fs.FileSystem
IOException
public long getUsed() throws IOException
getUsed
in class org.apache.hadoop.fs.FileSystem
IOException
public long getDefaultBlockSize()
getDefaultBlockSize
in class org.apache.hadoop.fs.FileSystem
public org.apache.hadoop.fs.FileChecksum getFileChecksum(org.apache.hadoop.fs.Path f) throws IOException
getFileChecksum
in class org.apache.hadoop.fs.FileSystem
IOException
public void setVerifyChecksum(boolean verifyChecksum)
setVerifyChecksum
in class org.apache.hadoop.fs.FileSystem
public void setPermission(org.apache.hadoop.fs.Path p, org.apache.hadoop.fs.permission.FsPermission permission) throws IOException
setPermission
in class org.apache.hadoop.fs.FileSystem
IOException
public void setOwner(org.apache.hadoop.fs.Path p, String username, String groupname) throws IOException
setOwner
in class org.apache.hadoop.fs.FileSystem
IOException
public void setTimes(org.apache.hadoop.fs.Path p, long mtime, long atime) throws IOException
setTimes
in class org.apache.hadoop.fs.FileSystem
IOException
Copyright © 2018. All rights reserved.