org.apache.hadoop.fs
Class HarFileSystem

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.fs.FileSystem
          extended by org.apache.hadoop.fs.HarFileSystem
All Implemented Interfaces:
Closeable, Configurable

public class HarFileSystem
extends FileSystem

This is an implementation of the Hadoop Archive Filesystem. This archive Filesystem has index files of the form _index* and has contents of the form part-*. The index files store the indexes of the real files. The index files are of the form _masterindex and _index. The master index is a level of indirection in to the index file to make the look ups faster. the index file is sorted with hash code of the paths that it contains and the master index contains pointers to the positions in index for ranges of hashcodes.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
FileSystem.Statistics
 
Field Summary
static int METADATA_CACHE_ENTRIES_DEFAULT
           
static String METADATA_CACHE_ENTRIES_KEY
           
static int VERSION
           
 
Fields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, SHUTDOWN_HOOK_PRIORITY, statistics
 
Constructor Summary
HarFileSystem()
          public construction of harfilesystem
HarFileSystem(FileSystem fs)
          Constructor to create a HarFileSystem with an underlying filesystem.
 
Method Summary
 FSDataOutputStream append(Path f)
          Append to an existing file (optional operation).
 FSDataOutputStream append(Path f, int bufferSize, Progressable progress)
          Append to an existing file (optional operation).
protected  URI canonicalizeUri(URI uri)
          Canonicalize the given URI.
protected  void checkPath(Path path)
          Check that a Path belongs to this FileSystem.
 void close()
          No more filesystem operations are needed.
 void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst)
          The src files are on the local disk.
 void copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst)
          not implemented.
 void copyToLocalFile(boolean delSrc, Path src, Path dst)
          copies the file in the har filesystem to a local file.
 FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Create an FSDataOutputStream at the indicated Path with write-progress reporting.
 FSDataOutputStream createNonRecursive(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Opens an FSDataOutputStream at the indicated Path with write-progress reporting.
 boolean delete(Path f, boolean recursive)
          Not implemented.
protected  URI getCanonicalUri()
          Used for delegation token related functionality.
 FileSystem[] getChildFileSystems()
          Used for delegation token related functionality.
 Configuration getConf()
          Return the configuration used by this object.
 long getDefaultBlockSize()
          Return the number of bytes that large input files should be optimally be split into to minimize i/o time.
 long getDefaultBlockSize(Path f)
          Return the number of bytes that large input files should be optimally be split into to minimize i/o time.
 short getDefaultReplication()
          Get the default replication.
 short getDefaultReplication(Path f)
          Get the default replication for a path.
 BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)
          Get block locations from the underlying fs and fix their offsets and lengths.
 FileChecksum getFileChecksum(Path f, long length)
          Get the checksum of a file, from the beginning of the file till the specific length.
 FileStatus getFileStatus(Path f)
          return the filestatus of files in har archive.
static int getHarHash(Path p)
          the hash of the path p inside the filesystem
 int getHarVersion()
           
 Path getHomeDirectory()
          return the top level archive path.
 Path getInitialWorkingDirectory()
          Note: with the new FilesContext class, getWorkingDirectory() will be removed.
 String getScheme()
          Return the protocol scheme for the FileSystem.
 FsServerDefaults getServerDefaults()
          Return a set of server default configuration values
 FsServerDefaults getServerDefaults(Path f)
          Return a set of server default configuration values
 FsStatus getStatus(Path p)
          Returns a status object describing the use and capacity of the file system.
 URI getUri()
          Returns the uri of this filesystem.
 long getUsed()
          Return the total size of all files in the filesystem.
 Path getWorkingDirectory()
          return the top level archive.
 void initialize(URI name, Configuration conf)
          Initialize a Har filesystem per har archive.
 FileStatus[] listStatus(Path f)
          liststatus returns the children of a directory after looking up the index files.
 Path makeQualified(Path path)
          Make sure that a path specifies a FileSystem.
 boolean mkdirs(Path f, FsPermission permission)
          not implemented.
 FSDataInputStream open(Path f, int bufferSize)
          Returns a har input stream which fakes end of file.
 boolean rename(Path src, Path dst)
          Renames Path src to Path dst.
 Path resolvePath(Path p)
          Return the fully-qualified path of path f resolving the path through any symlinks or mount point
 void setOwner(Path p, String username, String groupname)
          not implemented.
 void setPermission(Path p, FsPermission permission)
          Not implemented.
 boolean setReplication(Path src, short replication)
          Not implemented.
 void setTimes(Path p, long mtime, long atime)
          Set access time of a file
 void setWorkingDirectory(Path newDir)
          Set the current working directory for the given file system.
 Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 
Methods inherited from class org.apache.hadoop.fs.FileSystem
append, areSymlinksEnabled, cancelDeleteOnExit, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getBlockSize, getContentSummary, getDefaultPort, getDefaultUri, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getLength, getLinkTarget, getLocal, getName, getNamed, getReplication, getStatistics, getStatistics, getStatus, getXAttr, getXAttrs, getXAttrs, globStatus, globStatus, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listXAttrs, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, setAcl, setDefaultUri, setDefaultUri, setVerifyChecksum, setWriteChecksum, setXAttr, setXAttr, supportsSymlinks
 
Methods inherited from class org.apache.hadoop.conf.Configured
setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

METADATA_CACHE_ENTRIES_KEY

public static final String METADATA_CACHE_ENTRIES_KEY
See Also:
Constant Field Values

METADATA_CACHE_ENTRIES_DEFAULT

public static final int METADATA_CACHE_ENTRIES_DEFAULT
See Also:
Constant Field Values

VERSION

public static final int VERSION
See Also:
Constant Field Values
Constructor Detail

HarFileSystem

public HarFileSystem()
public construction of harfilesystem


HarFileSystem

public HarFileSystem(FileSystem fs)
Constructor to create a HarFileSystem with an underlying filesystem.

Parameters:
fs - underlying file system
Method Detail

getScheme

public String getScheme()
Return the protocol scheme for the FileSystem.

Overrides:
getScheme in class FileSystem
Returns:
har

initialize

public void initialize(URI name,
                       Configuration conf)
                throws IOException
Initialize a Har filesystem per har archive. The archive home directory is the top level directory in the filesystem that contains the HAR archive. Be careful with this method, you do not want to go on creating new Filesystem instances per call to path.getFileSystem(). the uri of Har is har://underlyingfsscheme-host:port/archivepath. or har:///archivepath. This assumes the underlying filesystem to be used in case not specified.

Overrides:
initialize in class FileSystem
Parameters:
name - a uri whose authority section names the host, port, etc. for this FileSystem
conf - the configuration
Throws:
IOException

getConf

public Configuration getConf()
Description copied from interface: Configurable
Return the configuration used by this object.

Specified by:
getConf in interface Configurable
Overrides:
getConf in class Configured

getHarVersion

public int getHarVersion()
                  throws IOException
Throws:
IOException

getWorkingDirectory

public Path getWorkingDirectory()
return the top level archive.

Specified by:
getWorkingDirectory in class FileSystem
Returns:
the directory pathname

getInitialWorkingDirectory

public Path getInitialWorkingDirectory()
Description copied from class: FileSystem
Note: with the new FilesContext class, getWorkingDirectory() will be removed. The working directory is implemented in FilesContext. Some file systems like LocalFileSystem have an initial workingDir that we use as the starting workingDir. For other file systems like HDFS there is no built in notion of an initial workingDir.

Overrides:
getInitialWorkingDirectory in class FileSystem
Returns:
if there is built in notion of workingDir then it is returned; else a null is returned.

getStatus

public FsStatus getStatus(Path p)
                   throws IOException
Description copied from class: FileSystem
Returns a status object describing the use and capacity of the file system. If the file system has multiple partitions, the use and capacity of the partition pointed to by the specified path is reflected.

Overrides:
getStatus in class FileSystem
Parameters:
p - Path for which status should be obtained. null means the default partition.
Returns:
a FsStatus object
Throws:
IOException - see specific implementation

getCanonicalUri

protected URI getCanonicalUri()
Used for delegation token related functionality. Must delegate to underlying file system.

Overrides:
getCanonicalUri in class FileSystem
See Also:
FileSystem.canonicalizeUri(URI)

canonicalizeUri

protected URI canonicalizeUri(URI uri)
Description copied from class: FileSystem
Canonicalize the given URI. This is filesystem-dependent, but may for example consist of canonicalizing the hostname using DNS and adding the default port if not specified. The default implementation simply fills in the default port if not specified and if the filesystem has a default port.

Overrides:
canonicalizeUri in class FileSystem
Returns:
URI
See Also:
NetUtils.getCanonicalUri(URI, int)

getUri

public URI getUri()
Returns the uri of this filesystem. The uri is of the form har://underlyingfsschema-host:port/pathintheunderlyingfs

Specified by:
getUri in class FileSystem

checkPath

protected void checkPath(Path path)
Description copied from class: FileSystem
Check that a Path belongs to this FileSystem.

Overrides:
checkPath in class FileSystem
Parameters:
path - to check

resolvePath

public Path resolvePath(Path p)
                 throws IOException
Description copied from class: FileSystem
Return the fully-qualified path of path f resolving the path through any symlinks or mount point

Overrides:
resolvePath in class FileSystem
Parameters:
p - path to be resolved
Returns:
fully qualified path
Throws:
FileNotFoundException
IOException

makeQualified

public Path makeQualified(Path path)
Description copied from class: FileSystem
Make sure that a path specifies a FileSystem.

Overrides:
makeQualified in class FileSystem
Parameters:
path - to use

getFileBlockLocations

public BlockLocation[] getFileBlockLocations(FileStatus file,
                                             long start,
                                             long len)
                                      throws IOException
Get block locations from the underlying fs and fix their offsets and lengths.

Overrides:
getFileBlockLocations in class FileSystem
Parameters:
file - the input file status to get block locations
start - the start of the desired range in the contained file
len - the length of the desired range
Returns:
block locations for this segment of file
Throws:
IOException

getHarHash

public static int getHarHash(Path p)
the hash of the path p inside the filesystem

Parameters:
p - the path in the harfilesystem
Returns:
the hash code of the path.

getFileStatus

public FileStatus getFileStatus(Path f)
                         throws IOException
return the filestatus of files in har archive. The permission returned are that of the archive index files. The permissions are not persisted while creating a hadoop archive.

Specified by:
getFileStatus in class FileSystem
Parameters:
f - the path in har filesystem
Returns:
filestatus.
Throws:
IOException
FileNotFoundException - when the path does not exist; IOException see specific implementation

getFileChecksum

public FileChecksum getFileChecksum(Path f,
                                    long length)
Description copied from class: FileSystem
Get the checksum of a file, from the beginning of the file till the specific length.

Overrides:
getFileChecksum in class FileSystem
Parameters:
f - The file path
length - The length of the file range for checksum calculation
Returns:
null since no checksum algorithm is implemented.

open

public FSDataInputStream open(Path f,
                              int bufferSize)
                       throws IOException
Returns a har input stream which fakes end of file. It reads the index files to get the part file name and the size and start of the file.

Specified by:
open in class FileSystem
Parameters:
f - the file name to open
bufferSize - the size of the buffer to be used.
Throws:
IOException

getChildFileSystems

public FileSystem[] getChildFileSystems()
Used for delegation token related functionality. Must delegate to underlying file system.

Returns:
FileSystems used by this FileSystem

create

public FSDataOutputStream create(Path f,
                                 FsPermission permission,
                                 boolean overwrite,
                                 int bufferSize,
                                 short replication,
                                 long blockSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Create an FSDataOutputStream at the indicated Path with write-progress reporting.

Specified by:
create in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

createNonRecursive

public FSDataOutputStream createNonRecursive(Path f,
                                             boolean overwrite,
                                             int bufferSize,
                                             short replication,
                                             long blockSize,
                                             Progressable progress)
                                      throws IOException
Description copied from class: FileSystem
Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.

Overrides:
createNonRecursive in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

append

public FSDataOutputStream append(Path f,
                                 int bufferSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Append to an existing file (optional operation).

Specified by:
append in class FileSystem
Parameters:
f - the existing file to be appended.
bufferSize - the size of the buffer to be used.
progress - for reporting progress if it is not null.
Throws:
IOException

close

public void close()
           throws IOException
Description copied from class: FileSystem
No more filesystem operations are needed. Will release any held locks.

Specified by:
close in interface Closeable
Overrides:
close in class FileSystem
Throws:
IOException

setReplication

public boolean setReplication(Path src,
                              short replication)
                       throws IOException
Not implemented.

Overrides:
setReplication in class FileSystem
Parameters:
src - file name
replication - new replication
Returns:
true if successful; false if file does not exist or is a directory
Throws:
IOException

rename

public boolean rename(Path src,
                      Path dst)
               throws IOException
Description copied from class: FileSystem
Renames Path src to Path dst. Can take place on local fs or remote DFS.

Specified by:
rename in class FileSystem
Parameters:
src - path to be renamed
dst - new path after rename
Returns:
true if rename is successful
Throws:
IOException - on failure

append

public FSDataOutputStream append(Path f)
                          throws IOException
Description copied from class: FileSystem
Append to an existing file (optional operation). Same as append(f, getConf().getInt("io.file.buffer.size", 4096), null)

Overrides:
append in class FileSystem
Parameters:
f - the existing file to be appended.
Throws:
IOException

delete

public boolean delete(Path f,
                      boolean recursive)
               throws IOException
Not implemented.

Specified by:
delete in class FileSystem
Parameters:
f - the path to delete.
recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
Returns:
true if delete is successful else false.
Throws:
IOException

listStatus

public FileStatus[] listStatus(Path f)
                        throws IOException
liststatus returns the children of a directory after looking up the index files.

Specified by:
listStatus in class FileSystem
Parameters:
f - given path
Returns:
the statuses of the files/directories in the given patch
Throws:
FileNotFoundException - when the path does not exist; IOException see specific implementation
IOException

getHomeDirectory

public Path getHomeDirectory()
return the top level archive path.

Overrides:
getHomeDirectory in class FileSystem

setWorkingDirectory

public void setWorkingDirectory(Path newDir)
Description copied from class: FileSystem
Set the current working directory for the given file system. All relative paths will be resolved relative to it.

Specified by:
setWorkingDirectory in class FileSystem

mkdirs

public boolean mkdirs(Path f,
                      FsPermission permission)
               throws IOException
not implemented.

Specified by:
mkdirs in class FileSystem
Parameters:
f - path to create
permission - to apply to f
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(boolean delSrc,
                              boolean overwrite,
                              Path src,
                              Path dst)
                       throws IOException
not implemented.

Overrides:
copyFromLocalFile in class FileSystem
Parameters:
delSrc - whether to delete the src
overwrite - whether to overwrite an existing file
src - path
dst - path
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(boolean delSrc,
                              boolean overwrite,
                              Path[] srcs,
                              Path dst)
                       throws IOException
Description copied from class: FileSystem
The src files are on the local disk. Add it to FS at the given dst name. delSrc indicates if the source should be removed

Overrides:
copyFromLocalFile in class FileSystem
Parameters:
delSrc - whether to delete the src
overwrite - whether to overwrite an existing file
srcs - array of paths which are source
dst - path
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(boolean delSrc,
                            Path src,
                            Path dst)
                     throws IOException
copies the file in the har filesystem to a local file.

Overrides:
copyToLocalFile in class FileSystem
Parameters:
delSrc - whether to delete the src
src - path
dst - path
Throws:
IOException

startLocalOutput

public Path startLocalOutput(Path fsOutputFile,
                             Path tmpLocalFile)
                      throws IOException
not implemented.

Overrides:
startLocalOutput in class FileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path of local tmp file
Throws:
IOException

completeLocalOutput

public void completeLocalOutput(Path fsOutputFile,
                                Path tmpLocalFile)
                         throws IOException
not implemented.

Overrides:
completeLocalOutput in class FileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path to local tmp file
Throws:
IOException

setOwner

public void setOwner(Path p,
                     String username,
                     String groupname)
              throws IOException
not implemented.

Overrides:
setOwner in class FileSystem
Parameters:
p - The path
username - If it is null, the original username remains unchanged.
groupname - If it is null, the original groupname remains unchanged.
Throws:
IOException

setTimes

public void setTimes(Path p,
                     long mtime,
                     long atime)
              throws IOException
Description copied from class: FileSystem
Set access time of a file

Overrides:
setTimes in class FileSystem
Parameters:
p - The path
mtime - Set the modification time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set modification time.
atime - Set the access time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set access time.
Throws:
IOException

setPermission

public void setPermission(Path p,
                          FsPermission permission)
                   throws IOException
Not implemented.

Overrides:
setPermission in class FileSystem
Throws:
IOException

getServerDefaults

public FsServerDefaults getServerDefaults()
                                   throws IOException
Description copied from class: FileSystem
Return a set of server default configuration values

Overrides:
getServerDefaults in class FileSystem
Returns:
server default configuration values
Throws:
IOException

getServerDefaults

public FsServerDefaults getServerDefaults(Path f)
                                   throws IOException
Description copied from class: FileSystem
Return a set of server default configuration values

Overrides:
getServerDefaults in class FileSystem
Parameters:
f - path is used to identify an FS since an FS could have another FS that it could be delegating the call to
Returns:
server default configuration values
Throws:
IOException

getUsed

public long getUsed()
             throws IOException
Description copied from class: FileSystem
Return the total size of all files in the filesystem.

Overrides:
getUsed in class FileSystem
Throws:
IOException

getDefaultBlockSize

public long getDefaultBlockSize()
Description copied from class: FileSystem
Return the number of bytes that large input files should be optimally be split into to minimize i/o time.

Overrides:
getDefaultBlockSize in class FileSystem

getDefaultBlockSize

public long getDefaultBlockSize(Path f)
Description copied from class: FileSystem
Return the number of bytes that large input files should be optimally be split into to minimize i/o time. The given path will be used to locate the actual filesystem. The full path does not have to exist.

Overrides:
getDefaultBlockSize in class FileSystem
Parameters:
f - path of file
Returns:
the default block size for the path's filesystem

getDefaultReplication

public short getDefaultReplication()
Description copied from class: FileSystem
Get the default replication.

Overrides:
getDefaultReplication in class FileSystem

getDefaultReplication

public short getDefaultReplication(Path f)
Description copied from class: FileSystem
Get the default replication for a path. The given path will be used to locate the actual filesystem. The full path does not have to exist.

Overrides:
getDefaultReplication in class FileSystem
Parameters:
f - of the file
Returns:
default replication for the path's filesystem


Copyright © 2014 Apache Software Foundation. All Rights Reserved.