org.apache.hadoop.fs
Class HarFileSystem

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.fs.FileSystem
          extended by org.apache.hadoop.fs.HarFileSystem
All Implemented Interfaces:
Closeable, Configurable

public class HarFileSystem
extends FileSystem

This is an implementation of the Hadoop Archive Filesystem. This archive Filesystem has index files of the form _index* and has contents of the form part-*. The index files store the indexes of the real files. The index files are of the form _masterindex and _index. The master index is a level of indirection in to the index file to make the look ups faster. the index file is sorted with hash code of the paths that it contains and the master index contains pointers to the positions in index for ranges of hashcodes.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
FileSystem.Statistics
 
Field Summary
static int METADATA_CACHE_ENTRIES_DEFAULT
           
static String METADATA_CACHE_ENTRIES_KEY
           
static int VERSION
           
 
Fields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, SHUTDOWN_HOOK_PRIORITY, statistics
 
Constructor Summary
HarFileSystem()
          public construction of harfilesystem
HarFileSystem(FileSystem fs)
          Constructor to create a HarFileSystem with an underlying filesystem.
 
Method Summary
 FSDataOutputStream append(Path f)
          Append to an existing file (optional operation).
 FSDataOutputStream append(Path f, int bufferSize, Progressable progress)
          Append to an existing file (optional operation).
 void close()
          No more filesystem operations are needed.
 void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 void copyFromLocalFile(boolean delSrc, Path src, Path dst)
          not implemented.
 void copyToLocalFile(boolean delSrc, Path src, Path dst)
          copies the file in the har filesystem to a local file.
 FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress)
          Create an FSDataOutputStream at the indicated Path with write-progress reporting.
 boolean delete(Path f, boolean recursive)
          Not implemented.
protected  URI getCanonicalUri()
          Return a canonicalized form of this FileSystem's URI.
 Configuration getConf()
          Return the configuration used by this object.
 BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len)
          Get block locations from the underlying fs and fix their offsets and lengths.
 FileChecksum getFileChecksum(Path f)
          Get the checksum of a file.
 FileStatus getFileStatus(Path f)
          return the filestatus of files in har archive.
static int getHarHash(Path p)
          the hash of the path p inside the filesystem
 int getHarVersion()
           
 Path getHomeDirectory()
          return the top level archive path.
 String getScheme()
          Return the protocol scheme for the FileSystem.
 URI getUri()
          Returns the uri of this filesystem.
 Path getWorkingDirectory()
          return the top level archive.
 void initialize(URI name, Configuration conf)
          Initialize a Har filesystem per har archive.
 FileStatus[] listStatus(Path f)
          liststatus returns the children of a directory after looking up the index files.
 Path makeQualified(Path path)
          Make sure that a path specifies a FileSystem.
 boolean mkdirs(Path f, FsPermission permission)
          not implemented.
 FSDataInputStream open(Path f, int bufferSize)
          Returns a har input stream which fakes end of file.
 boolean rename(Path src, Path dst)
          Renames Path src to Path dst.
 void setOwner(Path p, String username, String groupname)
          not implemented.
 void setPermission(Path p, FsPermission permission)
          Not implemented.
 boolean setReplication(Path src, short replication)
          Not implemented.
 void setWorkingDirectory(Path newDir)
          Set the current working directory for the given file system.
 Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          not implemented.
 
Methods inherited from class org.apache.hadoop.fs.FileSystem
append, cancelDeleteOnExit, canonicalizeUri, checkPath, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAllStatistics, getBlockSize, getContentSummary, getDefaultBlockSize, getDefaultBlockSize, getDefaultPort, getDefaultReplication, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileLinkStatus, getFileSystemClass, getFSofPath, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getUsed, globStatus, globStatus, isDirectory, isFile, isSymlinksEnabled, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, mkdirs, mkdirs, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, rename, renameSnapshot, resolveLink, resolvePath, setDefaultUri, setDefaultUri, setTimes, setVerifyChecksum, setWriteChecksum, supportsSymlinks
 
Methods inherited from class org.apache.hadoop.conf.Configured
setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

METADATA_CACHE_ENTRIES_KEY

public static final String METADATA_CACHE_ENTRIES_KEY
See Also:
Constant Field Values

METADATA_CACHE_ENTRIES_DEFAULT

public static final int METADATA_CACHE_ENTRIES_DEFAULT
See Also:
Constant Field Values

VERSION

public static final int VERSION
See Also:
Constant Field Values
Constructor Detail

HarFileSystem

public HarFileSystem()
public construction of harfilesystem


HarFileSystem

public HarFileSystem(FileSystem fs)
Constructor to create a HarFileSystem with an underlying filesystem.

Parameters:
fs - underlying file system
Method Detail

getScheme

public String getScheme()
Return the protocol scheme for the FileSystem.

Overrides:
getScheme in class FileSystem
Returns:
har

initialize

public void initialize(URI name,
                       Configuration conf)
                throws IOException
Initialize a Har filesystem per har archive. The archive home directory is the top level directory in the filesystem that contains the HAR archive. Be careful with this method, you do not want to go on creating new Filesystem instances per call to path.getFileSystem(). the uri of Har is har://underlyingfsscheme-host:port/archivepath. or har:///archivepath. This assumes the underlying filesystem to be used in case not specified.

Overrides:
initialize in class FileSystem
Parameters:
name - a uri whose authority section names the host, port, etc. for this FileSystem
conf - the configuration
Throws:
IOException

getConf

public Configuration getConf()
Description copied from interface: Configurable
Return the configuration used by this object.

Specified by:
getConf in interface Configurable
Overrides:
getConf in class Configured

getHarVersion

public int getHarVersion()
                  throws IOException
Throws:
IOException

getWorkingDirectory

public Path getWorkingDirectory()
return the top level archive.

Specified by:
getWorkingDirectory in class FileSystem
Returns:
the directory pathname

getCanonicalUri

protected URI getCanonicalUri()
Description copied from class: FileSystem
Return a canonicalized form of this FileSystem's URI. The default implementation simply calls FileSystem.canonicalizeUri(URI) on the filesystem's own URI, so subclasses typically only need to implement that method.

Overrides:
getCanonicalUri in class FileSystem
See Also:
FileSystem.canonicalizeUri(URI)

getUri

public URI getUri()
Returns the uri of this filesystem. The uri is of the form har://underlyingfsschema-host:port/pathintheunderlyingfs

Specified by:
getUri in class FileSystem

makeQualified

public Path makeQualified(Path path)
Description copied from class: FileSystem
Make sure that a path specifies a FileSystem.

Overrides:
makeQualified in class FileSystem
Parameters:
path - to use

getFileBlockLocations

public BlockLocation[] getFileBlockLocations(FileStatus file,
                                             long start,
                                             long len)
                                      throws IOException
Get block locations from the underlying fs and fix their offsets and lengths.

Overrides:
getFileBlockLocations in class FileSystem
Parameters:
file - the input file status to get block locations
start - the start of the desired range in the contained file
len - the length of the desired range
Returns:
block locations for this segment of file
Throws:
IOException

getHarHash

public static int getHarHash(Path p)
the hash of the path p inside the filesystem

Parameters:
p - the path in the harfilesystem
Returns:
the hash code of the path.

getFileStatus

public FileStatus getFileStatus(Path f)
                         throws IOException
return the filestatus of files in har archive. The permission returned are that of the archive index files. The permissions are not persisted while creating a hadoop archive.

Specified by:
getFileStatus in class FileSystem
Parameters:
f - the path in har filesystem
Returns:
filestatus.
Throws:
IOException
FileNotFoundException - when the path does not exist; IOException see specific implementation

getFileChecksum

public FileChecksum getFileChecksum(Path f)
Description copied from class: FileSystem
Get the checksum of a file.

Overrides:
getFileChecksum in class FileSystem
Parameters:
f - The file path
Returns:
null since no checksum algorithm is implemented.

open

public FSDataInputStream open(Path f,
                              int bufferSize)
                       throws IOException
Returns a har input stream which fakes end of file. It reads the index files to get the part file name and the size and start of the file.

Specified by:
open in class FileSystem
Parameters:
f - the file name to open
bufferSize - the size of the buffer to be used.
Throws:
IOException

create

public FSDataOutputStream create(Path f,
                                 FsPermission permission,
                                 boolean overwrite,
                                 int bufferSize,
                                 short replication,
                                 long blockSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Create an FSDataOutputStream at the indicated Path with write-progress reporting.

Specified by:
create in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
bufferSize - the size of the buffer to be used.
replication - required block replication for the file.
Throws:
IOException
See Also:
FileSystem.setPermission(Path, FsPermission)

append

public FSDataOutputStream append(Path f,
                                 int bufferSize,
                                 Progressable progress)
                          throws IOException
Description copied from class: FileSystem
Append to an existing file (optional operation).

Specified by:
append in class FileSystem
Parameters:
f - the existing file to be appended.
bufferSize - the size of the buffer to be used.
progress - for reporting progress if it is not null.
Throws:
IOException

close

public void close()
           throws IOException
Description copied from class: FileSystem
No more filesystem operations are needed. Will release any held locks.

Specified by:
close in interface Closeable
Overrides:
close in class FileSystem
Throws:
IOException

setReplication

public boolean setReplication(Path src,
                              short replication)
                       throws IOException
Not implemented.

Overrides:
setReplication in class FileSystem
Parameters:
src - file name
replication - new replication
Returns:
true if successful; false if file does not exist or is a directory
Throws:
IOException

rename

public boolean rename(Path src,
                      Path dst)
               throws IOException
Description copied from class: FileSystem
Renames Path src to Path dst. Can take place on local fs or remote DFS.

Specified by:
rename in class FileSystem
Parameters:
src - path to be renamed
dst - new path after rename
Returns:
true if rename is successful
Throws:
IOException - on failure

append

public FSDataOutputStream append(Path f)
                          throws IOException
Description copied from class: FileSystem
Append to an existing file (optional operation). Same as append(f, getConf().getInt("io.file.buffer.size", 4096), null)

Overrides:
append in class FileSystem
Parameters:
f - the existing file to be appended.
Throws:
IOException

delete

public boolean delete(Path f,
                      boolean recursive)
               throws IOException
Not implemented.

Specified by:
delete in class FileSystem
Parameters:
f - the path to delete.
recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
Returns:
true if delete is successful else false.
Throws:
IOException

listStatus

public FileStatus[] listStatus(Path f)
                        throws IOException
liststatus returns the children of a directory after looking up the index files.

Specified by:
listStatus in class FileSystem
Parameters:
f - given path
Returns:
the statuses of the files/directories in the given patch
Throws:
FileNotFoundException - when the path does not exist; IOException see specific implementation
IOException

getHomeDirectory

public Path getHomeDirectory()
return the top level archive path.

Overrides:
getHomeDirectory in class FileSystem

setWorkingDirectory

public void setWorkingDirectory(Path newDir)
Description copied from class: FileSystem
Set the current working directory for the given file system. All relative paths will be resolved relative to it.

Specified by:
setWorkingDirectory in class FileSystem

mkdirs

public boolean mkdirs(Path f,
                      FsPermission permission)
               throws IOException
not implemented.

Specified by:
mkdirs in class FileSystem
Parameters:
f - path to create
permission - to apply to f
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(boolean delSrc,
                              Path src,
                              Path dst)
                       throws IOException
not implemented.

Overrides:
copyFromLocalFile in class FileSystem
Parameters:
delSrc - whether to delete the src
src - path
dst - path
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(boolean delSrc,
                            Path src,
                            Path dst)
                     throws IOException
copies the file in the har filesystem to a local file.

Overrides:
copyToLocalFile in class FileSystem
Parameters:
delSrc - whether to delete the src
src - path
dst - path
Throws:
IOException

startLocalOutput

public Path startLocalOutput(Path fsOutputFile,
                             Path tmpLocalFile)
                      throws IOException
not implemented.

Overrides:
startLocalOutput in class FileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path of local tmp file
Throws:
IOException

completeLocalOutput

public void completeLocalOutput(Path fsOutputFile,
                                Path tmpLocalFile)
                         throws IOException
not implemented.

Overrides:
completeLocalOutput in class FileSystem
Parameters:
fsOutputFile - path of output file
tmpLocalFile - path to local tmp file
Throws:
IOException

setOwner

public void setOwner(Path p,
                     String username,
                     String groupname)
              throws IOException
not implemented.

Overrides:
setOwner in class FileSystem
Parameters:
p - The path
username - If it is null, the original username remains unchanged.
groupname - If it is null, the original groupname remains unchanged.
Throws:
IOException

setPermission

public void setPermission(Path p,
                          FsPermission permission)
                   throws IOException
Not implemented.

Overrides:
setPermission in class FileSystem
Throws:
IOException


Copyright © 2013 Apache Software Foundation. All Rights Reserved.