Package com.google.cloud.hadoop.fs.gcs
Class GoogleHadoopFileSystem
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.hadoop.fs.FileSystem
-
- com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.apache.hadoop.conf.Configurable
,org.apache.hadoop.fs.PathCapabilities
,org.apache.hadoop.fs.statistics.IOStatisticsSource
,org.apache.hadoop.security.token.DelegationTokenIssuer
public class GoogleHadoopFileSystem extends org.apache.hadoop.fs.FileSystem implements org.apache.hadoop.fs.statistics.IOStatisticsSource
GoogleHadoopFileSystem is rooted in a single bucket at initialization time; in this case, Hadoop paths no longer correspond directly to general GCS paths, and all Hadoop operations going through this FileSystem will never touch any GCS bucket other than the bucket on which this FileSystem is rooted.This implementation sacrifices a small amount of cross-bucket interoperability in favor of more straightforward FileSystem semantics and compatibility with existing Hadoop applications. In particular, it is not subject to bucket-naming constraints, and files are allowed to be placed in root.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GoogleHadoopFileSystem.GcsFileChecksumType
Available GCS checksum types for use withGoogleHadoopFileSystemConfiguration.GCS_FILE_CHECKSUM_TYPE
.static class
GoogleHadoopFileSystem.GlobAlgorithm
Available GCS glob algorithms for use withGoogleHadoopFileSystemConfiguration.GCS_GLOB_ALGORITHM
.
-
Constructor Summary
Constructors Constructor Description GoogleHadoopFileSystem()
Constructs an instance of GoogleHadoopFileSystem; the internal GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.fs.FSDataOutputStream
append(org.apache.hadoop.fs.Path hadoopPath, int bufferSize, org.apache.hadoop.util.Progressable progress)
Appends to an existing file (optional operation).protected void
checkPath(org.apache.hadoop.fs.Path path)
void
close()
void
completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile)
void
concat(org.apache.hadoop.fs.Path tgt, org.apache.hadoop.fs.Path[] srcs)
Concat existing files into one file.void
copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path[] srcs, org.apache.hadoop.fs.Path dst)
void
copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst)
void
copyToLocalFile(boolean delSrc, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst)
org.apache.hadoop.fs.FSDataOutputStream
create(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress)
org.apache.hadoop.fs.FSDataOutputStream
createNonRecursive(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, EnumSet<org.apache.hadoop.fs.CreateFlag> flags, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress)
boolean
delete(org.apache.hadoop.fs.Path hadoopPath, boolean recursive)
boolean
deleteOnExit(org.apache.hadoop.fs.Path f)
boolean
exists(org.apache.hadoop.fs.Path f)
String
getCanonicalServiceName()
org.apache.hadoop.fs.ContentSummary
getContentSummary(org.apache.hadoop.fs.Path f)
long
getDefaultBlockSize()
protected int
getDefaultPort()
The default port is listed as -1 as an indication that ports are not used.short
getDefaultReplication()
Gets the default replication factor.org.apache.hadoop.security.token.Token<?>
getDelegationToken(String renewer)
org.apache.hadoop.fs.FileChecksum
getFileChecksum(org.apache.hadoop.fs.Path hadoopPath)
org.apache.hadoop.fs.FileStatus
getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
GoogleCloudStorageFileSystem
getGcsFs()
Gets GCS FS instance.org.apache.hadoop.fs.Path
getHomeDirectory()
Returns home directory of the current user.GhfsInstrumentation
getInstrumentation()
org.apache.hadoop.fs.statistics.IOStatistics
getIOStatistics()
Get the instrumentation's IOStatistics.String
getScheme()
GhfsStorageStatistics
getStorageStatistics()
Get the storage statistics of this filesystem.URI
getUri()
Returns a URI of the root of this FileSystem.long
getUsed()
org.apache.hadoop.fs.Path
getWorkingDirectory()
Gets the current working directory.byte[]
getXAttr(org.apache.hadoop.fs.Path path, String name)
Map<String,byte[]>
getXAttrs(org.apache.hadoop.fs.Path path)
Map<String,byte[]>
getXAttrs(org.apache.hadoop.fs.Path path, List<String> names)
org.apache.hadoop.fs.FileStatus[]
globStatus(org.apache.hadoop.fs.Path pathPattern)
Returns an array of FileStatus objects whose path names match pathPattern.org.apache.hadoop.fs.FileStatus[]
globStatus(org.apache.hadoop.fs.Path pathPattern, org.apache.hadoop.fs.PathFilter filter)
boolean
hasPathCapability(org.apache.hadoop.fs.Path path, String capability)
void
initialize(URI path, org.apache.hadoop.conf.Configuration config)
org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus>
listLocatedStatus(org.apache.hadoop.fs.Path f)
org.apache.hadoop.fs.FileStatus[]
listStatus(org.apache.hadoop.fs.Path hadoopPath)
List<String>
listXAttrs(org.apache.hadoop.fs.Path path)
org.apache.hadoop.fs.Path
makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root its own parent.boolean
mkdirs(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission)
org.apache.hadoop.fs.FSDataInputStream
open(org.apache.hadoop.fs.Path hadoopPath, int bufferSize)
CompletableFuture<org.apache.hadoop.fs.FSDataInputStream>
openFileWithOptions(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.impl.OpenFileParameters parameters)
Initiate the open operation.protected void
processDeleteOnExit()
void
removeXAttr(org.apache.hadoop.fs.Path path, String name)
boolean
rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst)
void
setOwner(org.apache.hadoop.fs.Path p, String username, String groupname)
void
setPermission(org.apache.hadoop.fs.Path p, org.apache.hadoop.fs.permission.FsPermission permission)
void
setTimes(org.apache.hadoop.fs.Path p, long mtime, long atime)
void
setVerifyChecksum(boolean verifyChecksum)
void
setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
void
setXAttr(org.apache.hadoop.fs.Path path, String name, byte[] value, EnumSet<org.apache.hadoop.fs.XAttrSetFlag> flags)
org.apache.hadoop.fs.Path
startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile)
-
Methods inherited from class org.apache.hadoop.fs.FileSystem
access, append, append, append, append, appendFile, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, clearStatistics, closeAll, closeAllForUGI, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createDataInputStreamBuilder, createDataInputStreamBuilder, createDataOutputStreamBuilder, createFile, createMultipartUploader, createNewFile, createNonRecursive, createNonRecursive, createPathHandle, createSnapshot, createSnapshot, createSymlink, delete, deleteSnapshot, enableSymlinks, fixRelativePart, get, get, get, getAclStatus, getAdditionalTokenIssuers, getAllStatistics, getAllStoragePolicies, getBlockSize, getCanonicalUri, getChildFileSystems, getDefaultBlockSize, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getPathHandle, getQuotaUsage, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getStoragePolicy, getTrashRoot, getTrashRoots, getUsed, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listStatus, listStatus, listStatus, listStatusBatch, listStatusIterator, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, msync, newInstance, newInstance, newInstance, newInstanceLocal, open, open, open, openFile, openFile, openFileWithOptions, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, removeAcl, removeAclEntries, removeDefaultAcl, rename, renameSnapshot, resolveLink, resolvePath, satisfyStoragePolicy, setAcl, setDefaultUri, setDefaultUri, setQuota, setQuotaByStorageType, setReplication, setStoragePolicy, setWriteChecksum, setXAttr, supportsSymlinks, truncate, unsetStoragePolicy
-
-
-
-
Field Detail
-
SCHEME
public static final String SCHEME
URI scheme for GoogleHadoopFileSystem- See Also:
- Constant Field Values
-
-
Method Detail
-
initialize
public void initialize(URI path, org.apache.hadoop.conf.Configuration config) throws IOException
- Overrides:
initialize
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
checkPath
protected void checkPath(org.apache.hadoop.fs.Path path)
- Overrides:
checkPath
in classorg.apache.hadoop.fs.FileSystem
-
getScheme
public String getScheme()
- Overrides:
getScheme
in classorg.apache.hadoop.fs.FileSystem
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path hadoopPath, int bufferSize) throws IOException
- Specified by:
open
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
create
public org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress) throws IOException
- Specified by:
create
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
createNonRecursive
public org.apache.hadoop.fs.FSDataOutputStream createNonRecursive(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, EnumSet<org.apache.hadoop.fs.CreateFlag> flags, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress) throws IOException
- Overrides:
createNonRecursive
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
rename
public boolean rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
- Specified by:
rename
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
delete
public boolean delete(org.apache.hadoop.fs.Path hadoopPath, boolean recursive) throws IOException
- Specified by:
delete
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
listStatus
public org.apache.hadoop.fs.FileStatus[] listStatus(org.apache.hadoop.fs.Path hadoopPath) throws IOException
- Specified by:
listStatus
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
mkdirs
public boolean mkdirs(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission) throws IOException
- Specified by:
mkdirs
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getFileStatus
public org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.Path hadoopPath) throws IOException
- Specified by:
getFileStatus
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
globStatus
public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern, org.apache.hadoop.fs.PathFilter filter) throws IOException
- Overrides:
globStatus
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getDelegationToken
public org.apache.hadoop.security.token.Token<?> getDelegationToken(String renewer) throws IOException
- Specified by:
getDelegationToken
in interfaceorg.apache.hadoop.security.token.DelegationTokenIssuer
- Overrides:
getDelegationToken
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
copyFromLocalFile
public void copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path[] srcs, org.apache.hadoop.fs.Path dst) throws IOException
- Overrides:
copyFromLocalFile
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
copyFromLocalFile
public void copyFromLocalFile(boolean delSrc, boolean overwrite, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
- Overrides:
copyFromLocalFile
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getFileChecksum
public org.apache.hadoop.fs.FileChecksum getFileChecksum(org.apache.hadoop.fs.Path hadoopPath) throws IOException
- Overrides:
getFileChecksum
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
exists
public boolean exists(org.apache.hadoop.fs.Path f) throws IOException
- Overrides:
exists
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
listLocatedStatus
public org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> listLocatedStatus(org.apache.hadoop.fs.Path f) throws IOException
- Overrides:
listLocatedStatus
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getXAttr
public byte[] getXAttr(org.apache.hadoop.fs.Path path, String name) throws IOException
- Overrides:
getXAttr
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getXAttrs
public Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.Path path) throws IOException
- Overrides:
getXAttrs
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getXAttrs
public Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.Path path, List<String> names) throws IOException
- Overrides:
getXAttrs
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
listXAttrs
public List<String> listXAttrs(org.apache.hadoop.fs.Path path) throws IOException
- Overrides:
listXAttrs
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getStorageStatistics
public GhfsStorageStatistics getStorageStatistics()
Get the storage statistics of this filesystem.- Overrides:
getStorageStatistics
in classorg.apache.hadoop.fs.FileSystem
- Returns:
- the storage statistics
-
getIOStatistics
public org.apache.hadoop.fs.statistics.IOStatistics getIOStatistics()
Get the instrumentation's IOStatistics.- Specified by:
getIOStatistics
in interfaceorg.apache.hadoop.fs.statistics.IOStatisticsSource
-
getInstrumentation
public GhfsInstrumentation getInstrumentation()
-
makeQualified
public org.apache.hadoop.fs.Path makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root its own parent. This is POSIX compliant, but more importantly guards against poor directory accounting in the PathData class of Hadoop 2's FsShell.- Overrides:
makeQualified
in classorg.apache.hadoop.fs.FileSystem
-
getUri
public URI getUri()
Returns a URI of the root of this FileSystem.- Specified by:
getUri
in classorg.apache.hadoop.fs.FileSystem
-
getDefaultPort
protected int getDefaultPort()
The default port is listed as -1 as an indication that ports are not used.- Overrides:
getDefaultPort
in classorg.apache.hadoop.fs.FileSystem
-
hasPathCapability
public boolean hasPathCapability(org.apache.hadoop.fs.Path path, String capability)
- Specified by:
hasPathCapability
in interfaceorg.apache.hadoop.fs.PathCapabilities
- Overrides:
hasPathCapability
in classorg.apache.hadoop.fs.FileSystem
-
openFileWithOptions
public CompletableFuture<org.apache.hadoop.fs.FSDataInputStream> openFileWithOptions(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.impl.OpenFileParameters parameters) throws IOException
Initiate the open operation. This is invoked from both the FileSystem and FileContext APIs- Overrides:
openFileWithOptions
in classorg.apache.hadoop.fs.FileSystem
- Parameters:
hadoopPath
- path to the fileparameters
- open file parameters from the builder.- Returns:
- a future which will evaluate to the opened file.
- Throws:
IOException
- failure to resolve the link.IllegalArgumentException
- unknown mandatory key
-
append
public org.apache.hadoop.fs.FSDataOutputStream append(org.apache.hadoop.fs.Path hadoopPath, int bufferSize, org.apache.hadoop.util.Progressable progress) throws IOException
Appends to an existing file (optional operation). Not supported.- Specified by:
append
in classorg.apache.hadoop.fs.FileSystem
- Parameters:
hadoopPath
- The existing file to be appended.bufferSize
- The size of the buffer to be used.progress
- For reporting progress if it is not null.- Returns:
- A writable stream.
- Throws:
IOException
- if an error occurs.
-
concat
public void concat(org.apache.hadoop.fs.Path tgt, org.apache.hadoop.fs.Path[] srcs) throws IOException
Concat existing files into one file.- Overrides:
concat
in classorg.apache.hadoop.fs.FileSystem
- Parameters:
tgt
- the path to the target destination.srcs
- the paths to the sources to use for the concatenation.- Throws:
IOException
- IO failure
-
getWorkingDirectory
public org.apache.hadoop.fs.Path getWorkingDirectory()
Gets the current working directory.- Specified by:
getWorkingDirectory
in classorg.apache.hadoop.fs.FileSystem
- Returns:
- The current working directory.
-
getDefaultReplication
public short getDefaultReplication()
Gets the default replication factor.- Overrides:
getDefaultReplication
in classorg.apache.hadoop.fs.FileSystem
-
globStatus
public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern) throws IOException
Returns an array of FileStatus objects whose path names match pathPattern.Return null if pathPattern has no glob and the path does not exist. Return an empty array if pathPattern has a glob and no path matches it.
- Overrides:
globStatus
in classorg.apache.hadoop.fs.FileSystem
- Parameters:
pathPattern
- A regular expression specifying the path pattern.- Returns:
- An array of FileStatus objects.
- Throws:
IOException
- if an error occurs.
-
getHomeDirectory
public org.apache.hadoop.fs.Path getHomeDirectory()
Returns home directory of the current user.Note: This directory is only used for Hadoop purposes. It is not the same as a user's OS home directory.
- Overrides:
getHomeDirectory
in classorg.apache.hadoop.fs.FileSystem
-
getCanonicalServiceName
public String getCanonicalServiceName()
Returns the service if delegation tokens are configured, otherwise, null.
- Specified by:
getCanonicalServiceName
in interfaceorg.apache.hadoop.security.token.DelegationTokenIssuer
- Overrides:
getCanonicalServiceName
in classorg.apache.hadoop.fs.FileSystem
-
getGcsFs
public GoogleCloudStorageFileSystem getGcsFs()
Gets GCS FS instance.
-
deleteOnExit
public boolean deleteOnExit(org.apache.hadoop.fs.Path f) throws IOException
- Overrides:
deleteOnExit
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
processDeleteOnExit
protected void processDeleteOnExit()
- Overrides:
processDeleteOnExit
in classorg.apache.hadoop.fs.FileSystem
-
getContentSummary
public org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path f) throws IOException
- Overrides:
getContentSummary
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
copyToLocalFile
public void copyToLocalFile(boolean delSrc, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst) throws IOException
- Overrides:
copyToLocalFile
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
startLocalOutput
public org.apache.hadoop.fs.Path startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile) throws IOException
- Overrides:
startLocalOutput
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
completeLocalOutput
public void completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile, org.apache.hadoop.fs.Path tmpLocalFile) throws IOException
- Overrides:
completeLocalOutput
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getUsed
public long getUsed() throws IOException
- Overrides:
getUsed
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
getDefaultBlockSize
public long getDefaultBlockSize()
- Overrides:
getDefaultBlockSize
in classorg.apache.hadoop.fs.FileSystem
-
setWorkingDirectory
public void setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
- Specified by:
setWorkingDirectory
in classorg.apache.hadoop.fs.FileSystem
-
setVerifyChecksum
public void setVerifyChecksum(boolean verifyChecksum)
- Overrides:
setVerifyChecksum
in classorg.apache.hadoop.fs.FileSystem
-
setPermission
public void setPermission(org.apache.hadoop.fs.Path p, org.apache.hadoop.fs.permission.FsPermission permission) throws IOException
- Overrides:
setPermission
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
setOwner
public void setOwner(org.apache.hadoop.fs.Path p, String username, String groupname) throws IOException
- Overrides:
setOwner
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
setTimes
public void setTimes(org.apache.hadoop.fs.Path p, long mtime, long atime) throws IOException
- Overrides:
setTimes
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
setXAttr
public void setXAttr(org.apache.hadoop.fs.Path path, String name, byte[] value, EnumSet<org.apache.hadoop.fs.XAttrSetFlag> flags) throws IOException
- Overrides:
setXAttr
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
removeXAttr
public void removeXAttr(org.apache.hadoop.fs.Path path, String name) throws IOException
- Overrides:
removeXAttr
in classorg.apache.hadoop.fs.FileSystem
- Throws:
IOException
-
-