org.apache.accumulo.core.client.mapred
Class AccumuloOutputFormat

java.lang.Object
  extended by org.apache.accumulo.core.client.mapred.AccumuloOutputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>

public class AccumuloOutputFormat
extends Object
implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>

This class allows MapReduce jobs to use Accumulo as the sink for data. This OutputFormat accepts keys and values of type Text (for a table name) and Mutation from the Map and Reduce functions. The user must specify the following via static configurator methods:

Other static methods are optional.


Nested Class Summary
protected static class AccumuloOutputFormat.AccumuloRecordWriter
          A base class to be used to create RecordWriter instances that write to Accumulo.
 
Field Summary
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
AccumuloOutputFormat()
           
 
Method Summary
protected static Boolean canCreateTables(org.apache.hadoop.mapred.JobConf job)
          Determines whether tables are permitted to be created as needed.
 void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job)
           
protected static BatchWriterConfig getBatchWriterOptions(org.apache.hadoop.mapred.JobConf job)
          Gets the BatchWriterConfig settings.
protected static String getDefaultTableName(org.apache.hadoop.mapred.JobConf job)
          Gets the default table name from the configuration.
protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
          Initializes an Accumulo Instance based on the configuration.
protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
          Gets the log level from this configuration.
protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
          Gets the principal from the configuration.
 org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Mutation> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)
           
protected static Boolean getSimulationMode(org.apache.hadoop.mapred.JobConf job)
          Determines whether this feature is enabled.
protected static byte[] getToken(org.apache.hadoop.mapred.JobConf job)
          Gets the password from the configuration.
protected static String getTokenClass(org.apache.hadoop.mapred.JobConf job)
          Gets the serialized token class from the configuration.
protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
          Determines if the connector has been configured.
static void setBatchWriterOptions(org.apache.hadoop.mapred.JobConf job, BatchWriterConfig bwConfig)
          Sets the configuration for for the job's BatchWriter instances.
static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token)
          Sets the connector information needed to communicate with Accumulo in this job.
static void setCreateTables(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)
          Sets the directive to create new tables, as necessary.
static void setDefaultTableName(org.apache.hadoop.mapred.JobConf job, String tableName)
          Sets the default table name to use if one emits a null in place of a table name for a given mutation.
static void setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)
          Sets the log level for this job.
static void setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)
          Configures a MockInstance for this job.
static void setSimulationMode(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)
          Sets the directive to use simulation mode for this job.
static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)
          Configures a ZooKeeperInstance for this job.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static final org.apache.log4j.Logger log
Constructor Detail

AccumuloOutputFormat

public AccumuloOutputFormat()
Method Detail

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job,
                                    String principal,
                                    AuthenticationToken token)
                             throws AccumuloSecurityException
Sets the connector information needed to communicate with Accumulo in this job.

WARNING: The serialized token is stored in the configuration and shared with all MapReduce tasks. It is BASE64 encoded to provide a charset safe conversion to a string, and is not intended to be secure.

Parameters:
job - the Hadoop job instance to be configured
principal - a valid Accumulo user name (user must have Table.CREATE permission if setCreateTables(JobConf, boolean) is set to true)
token - the user's password
Throws:
AccumuloSecurityException
Since:
1.5.0

isConnectorInfoSet

protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
Determines if the connector has been configured.

Parameters:
job - the Hadoop context for the configured job
Returns:
true if the connector has been configured, false otherwise
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

getPrincipal

protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
Gets the principal from the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the user name
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

getTokenClass

protected static String getTokenClass(org.apache.hadoop.mapred.JobConf job)
Gets the serialized token class from the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the user name
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

getToken

protected static byte[] getToken(org.apache.hadoop.mapred.JobConf job)
Gets the password from the configuration. WARNING: The password is stored in the Configuration and shared with all MapReduce tasks; It is BASE64 encoded to provide a charset safe conversion to a string, and is not intended to be secure.

Parameters:
job - the Hadoop context for the configured job
Returns:
the decoded user password
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

setZooKeeperInstance

public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job,
                                        String instanceName,
                                        String zooKeepers)
Configures a ZooKeeperInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
zooKeepers - a comma-separated list of zookeeper servers
Since:
1.5.0

setMockInstance

public static void setMockInstance(org.apache.hadoop.mapred.JobConf job,
                                   String instanceName)
Configures a MockInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
Since:
1.5.0

getInstance

protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
Initializes an Accumulo Instance based on the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
an Accumulo instance
Since:
1.5.0
See Also:
setZooKeeperInstance(JobConf, String, String), setMockInstance(JobConf, String)

setLogLevel

public static void setLogLevel(org.apache.hadoop.mapred.JobConf job,
                               org.apache.log4j.Level level)
Sets the log level for this job.

Parameters:
job - the Hadoop job instance to be configured
level - the logging level
Since:
1.5.0

getLogLevel

protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
Gets the log level from this configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the log level
Since:
1.5.0
See Also:
setLogLevel(JobConf, Level)

setDefaultTableName

public static void setDefaultTableName(org.apache.hadoop.mapred.JobConf job,
                                       String tableName)
Sets the default table name to use if one emits a null in place of a table name for a given mutation. Table names can only be alpha-numeric and underscores.

Parameters:
job - the Hadoop job instance to be configured
tableName - the table to use when the tablename is null in the write call
Since:
1.5.0

getDefaultTableName

protected static String getDefaultTableName(org.apache.hadoop.mapred.JobConf job)
Gets the default table name from the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the default table name
Since:
1.5.0
See Also:
setDefaultTableName(JobConf, String)

setBatchWriterOptions

public static void setBatchWriterOptions(org.apache.hadoop.mapred.JobConf job,
                                         BatchWriterConfig bwConfig)
Sets the configuration for for the job's BatchWriter instances. If not set, a new BatchWriterConfig, with sensible built-in defaults is used. Setting the configuration multiple times overwrites any previous configuration.

Parameters:
job - the Hadoop job instance to be configured
bwConfig - the configuration for the BatchWriter
Since:
1.5.0

getBatchWriterOptions

protected static BatchWriterConfig getBatchWriterOptions(org.apache.hadoop.mapred.JobConf job)
Gets the BatchWriterConfig settings.

Parameters:
job - the Hadoop context for the configured job
Returns:
the configuration object
Since:
1.5.0
See Also:
setBatchWriterOptions(JobConf, BatchWriterConfig)

setCreateTables

public static void setCreateTables(org.apache.hadoop.mapred.JobConf job,
                                   boolean enableFeature)
Sets the directive to create new tables, as necessary. Table names can only be alpha-numeric and underscores.

By default, this feature is disabled.

Parameters:
job - the Hadoop job instance to be configured
enableFeature - the feature is enabled if true, disabled otherwise
Since:
1.5.0

canCreateTables

protected static Boolean canCreateTables(org.apache.hadoop.mapred.JobConf job)
Determines whether tables are permitted to be created as needed.

Parameters:
job - the Hadoop context for the configured job
Returns:
true if the feature is disabled, false otherwise
Since:
1.5.0
See Also:
setCreateTables(JobConf, boolean)

setSimulationMode

public static void setSimulationMode(org.apache.hadoop.mapred.JobConf job,
                                     boolean enableFeature)
Sets the directive to use simulation mode for this job. In simulation mode, no output is produced. This is useful for testing.

By default, this feature is disabled.

Parameters:
job - the Hadoop job instance to be configured
enableFeature - the feature is enabled if true, disabled otherwise
Since:
1.5.0

getSimulationMode

protected static Boolean getSimulationMode(org.apache.hadoop.mapred.JobConf job)
Determines whether this feature is enabled.

Parameters:
job - the Hadoop context for the configured job
Returns:
true if the feature is enabled, false otherwise
Since:
1.5.0
See Also:
setSimulationMode(JobConf, boolean)

checkOutputSpecs

public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored,
                             org.apache.hadoop.mapred.JobConf job)
                      throws IOException
Specified by:
checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>
Throws:
IOException

getRecordWriter

public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Mutation> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored,
                                                                                                 org.apache.hadoop.mapred.JobConf job,
                                                                                                 String name,
                                                                                                 org.apache.hadoop.util.Progressable progress)
                                                                                          throws IOException
Specified by:
getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>
Throws:
IOException


Copyright © 2013 Apache Accumulo Project. All Rights Reserved.