AccumuloOutputFormat (Core 1.6.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce
Class AccumuloOutputFormat

java.lang.Object
  org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>
      org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat

public class AccumuloOutputFormat
extends org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>
extends org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>

This class allows MapReduce jobs to use Accumulo as the sink for data. This OutputFormat accepts keys and values of type Text (for a table name) and Mutation from the Map and Reduce functions. The user must specify the following via static configurator methods:

Other static methods are optional.

Nested Class Summary
`protected static class`	`AccumuloOutputFormat.AccumuloRecordWriter` A base class to be used to create `RecordWriter` instances that write to Accumulo.

Field Summary
`protected static org.apache.log4j.Logger`	`log`

Constructor Summary
`AccumuloOutputFormat()`

Method Summary
`protected static Boolean`	`canCreateTables(org.apache.hadoop.mapreduce.JobContext context)` Determines whether tables are permitted to be created as needed.
`void`	`checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext job)`
`protected static AuthenticationToken`	`getAuthenticationToken(org.apache.hadoop.mapreduce.JobContext context)` Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
`protected static BatchWriterConfig`	`getBatchWriterOptions(org.apache.hadoop.mapreduce.JobContext context)` Gets the `BatchWriterConfig` settings.
`protected static String`	`getDefaultTableName(org.apache.hadoop.mapreduce.JobContext context)` Gets the default table name from the configuration.
`protected static Instance`	`getInstance(org.apache.hadoop.mapreduce.JobContext context)` Initializes an Accumulo `Instance` based on the configuration.
`protected static org.apache.log4j.Level`	`getLogLevel(org.apache.hadoop.mapreduce.JobContext context)` Gets the log level from this configuration.
`org.apache.hadoop.mapreduce.OutputCommitter`	`getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)`
`protected static String`	`getPrincipal(org.apache.hadoop.mapreduce.JobContext context)` Gets the user name from the configuration.
`org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.Text,Mutation>`	`getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext attempt)`
`protected static Boolean`	`getSimulationMode(org.apache.hadoop.mapreduce.JobContext context)` Determines whether this feature is enabled.
`protected static byte[]`	`getToken(org.apache.hadoop.mapreduce.JobContext context)` Deprecated. since 1.6.0; Use `getAuthenticationToken(JobContext)` instead.
`protected static String`	`getTokenClass(org.apache.hadoop.mapreduce.JobContext context)` Deprecated. since 1.6.0; Use `getAuthenticationToken(JobContext)` instead.
`protected static Boolean`	`isConnectorInfoSet(org.apache.hadoop.mapreduce.JobContext context)` Determines if the connector has been configured.
`static void`	`setBatchWriterOptions(org.apache.hadoop.mapreduce.Job job, BatchWriterConfig bwConfig)` Sets the configuration for for the job's `BatchWriter` instances.
`static void`	`setConnectorInfo(org.apache.hadoop.mapreduce.Job job, String principal, AuthenticationToken token)` Sets the connector information needed to communicate with Accumulo in this job.
`static void`	`setConnectorInfo(org.apache.hadoop.mapreduce.Job job, String principal, String tokenFile)` Sets the connector information needed to communicate with Accumulo in this job.
`static void`	`setCreateTables(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Sets the directive to create new tables, as necessary.
`static void`	`setDefaultTableName(org.apache.hadoop.mapreduce.Job job, String tableName)` Sets the default table name to use if one emits a null in place of a table name for a given mutation.
`static void`	`setLogLevel(org.apache.hadoop.mapreduce.Job job, org.apache.log4j.Level level)` Sets the log level for this job.
`static void`	`setMockInstance(org.apache.hadoop.mapreduce.Job job, String instanceName)` Configures a `MockInstance` for this job.
`static void`	`setSimulationMode(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Sets the directive to use simulation mode for this job.
`static void`	`setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job, ClientConfiguration clientConfig)` Configures a `ZooKeeperInstance` for this job.
`static void`	`setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job, String instanceName, String zooKeepers)` Deprecated. since 1.6.0; Use `setZooKeeperInstance(Job, ClientConfiguration)` instead.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

log

protected static final org.apache.log4j.Logger log

Constructor Detail

AccumuloOutputFormat

public AccumuloOutputFormat()

Method Detail

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job,
                                    String principal,
                                    AuthenticationToken token)
                             throws AccumuloSecurityException

Sets the connector information needed to communicate with Accumulo in this job.

WARNING: The serialized token is stored in the configuration and shared with all MapReduce tasks. It is BASE64 encoded to provide a charset safe conversion to a string, and is not intended to be secure.

Parameters:: job - the Hadoop job instance to be configured; principal - a valid Accumulo user name (user must have Table.CREATE permission if setCreateTables(Job, boolean) is set to true); token - the user's password
Throws:: AccumuloSecurityException
Since:: 1.5.0

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job,
                                    String principal,
                                    String tokenFile)
                             throws AccumuloSecurityException

Sets the connector information needed to communicate with Accumulo in this job.

Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.

Parameters:: job - the Hadoop job instance to be configured; principal - a valid Accumulo user name (user must have Table.CREATE permission if setCreateTables(Job, boolean) is set to true); tokenFile - the path to the token file
Throws:: AccumuloSecurityException
Since:: 1.6.0

isConnectorInfoSet

protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapreduce.JobContext context)

Determines if the connector has been configured.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the connector has been configured, false otherwise
Since:: 1.5.0
See Also:: setConnectorInfo(Job, String, AuthenticationToken)

getPrincipal

protected static String getPrincipal(org.apache.hadoop.mapreduce.JobContext context)

Gets the user name from the configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: the user name
Since:: 1.5.0
See Also:: setConnectorInfo(Job, String, AuthenticationToken)

getTokenClass

@Deprecated
protected static String getTokenClass(org.apache.hadoop.mapreduce.JobContext context)

Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.

Gets the serialized token class from either the configuration or the token file.

Since:: 1.5.0

getToken

@Deprecated
protected static byte[] getToken(org.apache.hadoop.mapreduce.JobContext context)

Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.

Gets the serialized token from either the configuration or the token file.

Since:: 1.5.0

getAuthenticationToken

protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapreduce.JobContext context)

Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.

Parameters:: context - the Hadoop context for the configured job
Returns:: the principal's authentication token
Since:: 1.6.0
See Also:: setConnectorInfo(Job, String, AuthenticationToken), setConnectorInfo(Job, String, String)

setZooKeeperInstance

@Deprecated
public static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job,
                                                   String instanceName,
                                                   String zooKeepers)

Deprecated. since 1.6.0; Use setZooKeeperInstance(Job, ClientConfiguration) instead.

Configures a ZooKeeperInstance for this job.

Parameters:: job - the Hadoop job instance to be configured; instanceName - the Accumulo instance name; zooKeepers - a comma-separated list of zookeeper servers
Since:: 1.5.0

setZooKeeperInstance

public static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job,
                                        ClientConfiguration clientConfig)

Configures a ZooKeeperInstance for this job.

Parameters:: job - the Hadoop job instance to be configured; clientConfig - client configuration for specifying connection timeouts, SSL connection options, etc.
Since:: 1.6.0

setMockInstance

public static void setMockInstance(org.apache.hadoop.mapreduce.Job job,
                                   String instanceName)

Configures a MockInstance for this job.

Parameters:: job - the Hadoop job instance to be configured; instanceName - the Accumulo instance name
Since:: 1.5.0

getInstance

protected static Instance getInstance(org.apache.hadoop.mapreduce.JobContext context)

Initializes an Accumulo Instance based on the configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: an Accumulo instance
Since:: 1.5.0
See Also:: setZooKeeperInstance(Job, ClientConfiguration), setMockInstance(Job, String)

setLogLevel

public static void setLogLevel(org.apache.hadoop.mapreduce.Job job,
                               org.apache.log4j.Level level)

Sets the log level for this job.

Parameters:: job - the Hadoop job instance to be configured; level - the logging level
Since:: 1.5.0

getLogLevel

protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapreduce.JobContext context)

Gets the log level from this configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: the log level
Since:: 1.5.0
See Also:: setLogLevel(Job, Level)

setDefaultTableName

public static void setDefaultTableName(org.apache.hadoop.mapreduce.Job job,
                                       String tableName)

Sets the default table name to use if one emits a null in place of a table name for a given mutation. Table names can only be alpha-numeric and underscores.

Parameters:: job - the Hadoop job instance to be configured; tableName - the table to use when the tablename is null in the write call
Since:: 1.5.0

getDefaultTableName

protected static String getDefaultTableName(org.apache.hadoop.mapreduce.JobContext context)

Gets the default table name from the configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: the default table name
Since:: 1.5.0
See Also:: setDefaultTableName(Job, String)

setBatchWriterOptions

public static void setBatchWriterOptions(org.apache.hadoop.mapreduce.Job job,
                                         BatchWriterConfig bwConfig)

Sets the configuration for for the job's BatchWriter instances. If not set, a new BatchWriterConfig, with sensible built-in defaults is used. Setting the configuration multiple times overwrites any previous configuration.

Parameters:: job - the Hadoop job instance to be configured; bwConfig - the configuration for the BatchWriter
Since:: 1.5.0

getBatchWriterOptions

protected static BatchWriterConfig getBatchWriterOptions(org.apache.hadoop.mapreduce.JobContext context)

Gets the BatchWriterConfig settings.

Parameters:: context - the Hadoop context for the configured job
Returns:: the configuration object
Since:: 1.5.0
See Also:: setBatchWriterOptions(Job, BatchWriterConfig)

setCreateTables

public static void setCreateTables(org.apache.hadoop.mapreduce.Job job,
                                   boolean enableFeature)

Sets the directive to create new tables, as necessary. Table names can only be alpha-numeric and underscores.

By default, this feature is disabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0

canCreateTables

protected static Boolean canCreateTables(org.apache.hadoop.mapreduce.JobContext context)

Determines whether tables are permitted to be created as needed.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the feature is disabled, false otherwise
Since:: 1.5.0
See Also:: setCreateTables(Job, boolean)

setSimulationMode

public static void setSimulationMode(org.apache.hadoop.mapreduce.Job job,
                                     boolean enableFeature)

Sets the directive to use simulation mode for this job. In simulation mode, no output is produced. This is useful for testing.

By default, this feature is disabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0

getSimulationMode

protected static Boolean getSimulationMode(org.apache.hadoop.mapreduce.JobContext context)

Determines whether this feature is enabled.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the feature is enabled, false otherwise
Since:: 1.5.0
See Also:: setSimulationMode(Job, boolean)

checkOutputSpecs

public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext job)
                      throws IOException

Specified by:: checkOutputSpecs in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>

Throws:: IOException

getOutputCommitter

public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)

Specified by:: getOutputCommitter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.Text,Mutation> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext attempt)
                                                                                             throws IOException

Specified by:: getRecordWriter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,Mutation>

Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce Class AccumuloOutputFormat

log

AccumuloOutputFormat

setConnectorInfo

setConnectorInfo

isConnectorInfoSet

getPrincipal

getTokenClass

getToken

getAuthenticationToken

setZooKeeperInstance

setZooKeeperInstance

setMockInstance

getInstance

setLogLevel

getLogLevel

setDefaultTableName

getDefaultTableName

setBatchWriterOptions

getBatchWriterOptions

setCreateTables

canCreateTables

setSimulationMode

getSimulationMode

checkOutputSpecs

getOutputCommitter

getRecordWriter

org.apache.accumulo.core.client.mapreduce
Class AccumuloOutputFormat