InputConfigurator (Core 1.6.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce.lib.impl
Class InputConfigurator

java.lang.Object
  org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
      org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator

public class InputConfigurator
extends ConfiguratorBase
extends ConfiguratorBase

Since:: 1.6.0

Nested Class Summary
`static class`	`InputConfigurator.Features` Configuration keys for various features.
`static class`	`InputConfigurator.ScanOpts` Configuration keys for `Scanner`.

Nested classes/interfaces inherited from class org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
`ConfiguratorBase.ConnectorInfo, ConfiguratorBase.GeneralOpts, ConfiguratorBase.InstanceOpts, ConfiguratorBase.TokenSource`

Constructor Summary
`InputConfigurator()`

Method Summary
`static void`	`addIterator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, IteratorSetting cfg)` Encode an iterator on the input for the single input table associated with this job.
`static Map<String,Map<KeyExtent,List<Range>>>`	`binOffline(String tableId, List<Range> ranges, Instance instance, Connector conn)`
`static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>`	`deserializeFetchedColumns(Collection<String> serialized)`
`static void`	`fetchColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)` Restricts the columns that will be mapped over for the single input table on this job.
`static Boolean`	`getAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Determines whether a configuration has auto-adjust ranges enabled.
`protected static Map.Entry<String,InputTableConfig>`	`getDefaultInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Returns the `InputTableConfig` for the configuration based on the properties set using the single-table input methods.
`static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>`	`getFetchedColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Gets the columns to be mapped over from this job.
`static InputTableConfig`	`getInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)` Returns the `InputTableConfig` for the given table
`static Map<String,InputTableConfig>`	`getInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Returns all `InputTableConfig` objects associated with this job.
`static String`	`getInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Sets the name of the input table, over which this job will scan.
`static List<IteratorSetting>`	`getIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.
`static List<Range>`	`getRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Gets the ranges to scan over from a job.
`static Authorizations`	`getScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Gets the authorizations to set for the scans from the configuration.
`static TabletLocator`	`getTabletLocator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableId)` Initializes an Accumulo `TabletLocator` based on the configuration.
`static Boolean`	`isIsolated(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Determines whether a configuration has isolation enabled.
`static Boolean`	`isOfflineScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Determines whether a configuration has the offline table scan feature enabled.
`static String[]`	`serializeColumns(Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)`
`static void`	`setAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)` Controls the automatic adjustment of ranges for this job.
`static void`	`setInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Map<String,InputTableConfig> configs)` Sets configurations for multiple tables at a time.
`static void`	`setInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)` Sets the name of the input table, over which this job will scan.
`static void`	`setLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)` Controls the use of the `ClientSideIteratorScanner` in this job.
`static void`	`setOfflineTableScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)` Enable reading offline tables.
`static void`	`setRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Range> ranges)` Sets the input ranges to scan on all input tables for this job.
`static void`	`setScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Authorizations auths)` Sets the `Authorizations` used to scan.
`static void`	`setScanIsolation(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)` Controls the use of the `IsolatedScanner` in this job.
`static Boolean`	`usesLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Determines whether a configuration uses local iterators.
`static void`	`validateOptions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)` Check whether a configuration is fully configured to be used with an Accumulo `InputFormat`.

Methods inherited from class org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
`enumToConfKey, enumToConfKey, getAuthenticationToken, getInstance, getLogLevel, getPrincipal, getTokenFromFile, getVisibilityCacheSize, isConnectorInfoSet, setConnectorInfo, setConnectorInfo, setLogLevel, setMockInstance, setVisibilityCacheSize, setZooKeeperInstance`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

InputConfigurator

public InputConfigurator()

Method Detail

setInputTableName

public static void setInputTableName(Class<?> implementingClass,
                                     org.apache.hadoop.conf.Configuration conf,
                                     String tableName)

Sets the name of the input table, over which this job will scan.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; tableName - the table to use when the tablename is null in the write call
Since:: 1.6.0

getInputTableName

public static String getInputTableName(Class<?> implementingClass,
                                       org.apache.hadoop.conf.Configuration conf)

Sets the name of the input table, over which this job will scan.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Since:: 1.6.0

setScanAuthorizations

public static void setScanAuthorizations(Class<?> implementingClass,
                                         org.apache.hadoop.conf.Configuration conf,
                                         Authorizations auths)

Sets the Authorizations used to scan. Must be a subset of the user's authorization. Defaults to the empty set.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; auths - the user's authorizations
Since:: 1.6.0

getScanAuthorizations

public static Authorizations getScanAuthorizations(Class<?> implementingClass,
                                                   org.apache.hadoop.conf.Configuration conf)

Gets the authorizations to set for the scans from the configuration.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: the Accumulo scan authorizations
Since:: 1.6.0
See Also:: setScanAuthorizations(Class, Configuration, Authorizations)

setRanges

public static void setRanges(Class<?> implementingClass,
                             org.apache.hadoop.conf.Configuration conf,
                             Collection<Range> ranges)

Sets the input ranges to scan on all input tables for this job. If not set, the entire table will be scanned.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; ranges - the ranges that will be mapped over
Throws:: IllegalArgumentException - if the ranges cannot be encoded into base 64
Since:: 1.6.0

getRanges

public static List<Range> getRanges(Class<?> implementingClass,
                                    org.apache.hadoop.conf.Configuration conf)
                             throws IOException

Gets the ranges to scan over from a job.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: the ranges
Throws:: IOException - if the ranges have been encoded improperly
Since:: 1.6.0
See Also:: setRanges(Class, Configuration, Collection)

getIterators

public static List<IteratorSetting> getIterators(Class<?> implementingClass,
                                                 org.apache.hadoop.conf.Configuration conf)

Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: a list of iterators
Since:: 1.6.0
See Also:: addIterator(Class, Configuration, IteratorSetting)

fetchColumns

public static void fetchColumns(Class<?> implementingClass,
                                org.apache.hadoop.conf.Configuration conf,
                                Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)

Restricts the columns that will be mapped over for the single input table on this job.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; columnFamilyColumnQualifierPairs - a pair of Text objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is selected. An empty set is the default and is equivalent to scanning the all columns.
Throws:: IllegalArgumentException - if the column family is null
Since:: 1.6.0

serializeColumns

public static String[] serializeColumns(Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)

getFetchedColumns

public static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> getFetchedColumns(Class<?> implementingClass,
                                                                                               org.apache.hadoop.conf.Configuration conf)

Gets the columns to be mapped over from this job.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: a set of columns
Since:: 1.6.0
See Also:: fetchColumns(Class, Configuration, Collection)

deserializeFetchedColumns

public static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> deserializeFetchedColumns(Collection<String> serialized)

addIterator

public static void addIterator(Class<?> implementingClass,
                               org.apache.hadoop.conf.Configuration conf,
                               IteratorSetting cfg)

Encode an iterator on the input for the single input table associated with this job.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; cfg - the configuration of the iterator
Throws:: IllegalArgumentException - if the iterator can't be serialized into the configuration
Since:: 1.6.0

setAutoAdjustRanges

public static void setAutoAdjustRanges(Class<?> implementingClass,
                                       org.apache.hadoop.conf.Configuration conf,
                                       boolean enableFeature)

Controls the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. The default setting is enabled. *

By default, this feature is enabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.6.0
See Also:: setRanges(Class, Configuration, Collection)

getAutoAdjustRanges

public static Boolean getAutoAdjustRanges(Class<?> implementingClass,
                                          org.apache.hadoop.conf.Configuration conf)

Determines whether a configuration has auto-adjust ranges enabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: false if the feature is disabled, true otherwise
Since:: 1.6.0
See Also:: setAutoAdjustRanges(Class, Configuration, boolean)

setScanIsolation

public static void setScanIsolation(Class<?> implementingClass,
                                    org.apache.hadoop.conf.Configuration conf,
                                    boolean enableFeature)

Controls the use of the IsolatedScanner in this job.

By default, this feature is disabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.6.0

isIsolated

public static Boolean isIsolated(Class<?> implementingClass,
                                 org.apache.hadoop.conf.Configuration conf)

Determines whether a configuration has isolation enabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: true if the feature is enabled, false otherwise
Since:: 1.6.0
See Also:: setScanIsolation(Class, Configuration, boolean)

setLocalIterators

public static void setLocalIterators(Class<?> implementingClass,
                                     org.apache.hadoop.conf.Configuration conf,
                                     boolean enableFeature)

Controls the use of the ClientSideIteratorScanner in this job. Enabling this feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.

By default, this feature is disabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.6.0

usesLocalIterators

public static Boolean usesLocalIterators(Class<?> implementingClass,
                                         org.apache.hadoop.conf.Configuration conf)

Determines whether a configuration uses local iterators.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: true if the feature is enabled, false otherwise
Since:: 1.6.0
See Also:: setLocalIterators(Class, Configuration, boolean)

setOfflineTableScan

public static void setOfflineTableScan(Class<?> implementingClass,
                                       org.apache.hadoop.conf.Configuration conf,
                                       boolean enableFeature)

Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.

To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.

Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.

One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.

There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.

By default, this feature is disabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.6.0

isOfflineScan

public static Boolean isOfflineScan(Class<?> implementingClass,
                                    org.apache.hadoop.conf.Configuration conf)

Determines whether a configuration has the offline table scan feature enabled.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: true if the feature is enabled, false otherwise
Since:: 1.6.0
See Also:: setOfflineTableScan(Class, Configuration, boolean)

setInputTableConfigs

public static void setInputTableConfigs(Class<?> implementingClass,
                                        org.apache.hadoop.conf.Configuration conf,
                                        Map<String,InputTableConfig> configs)

Sets configurations for multiple tables at a time.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; configs - an array of InputTableConfig objects to associate with the job
Since:: 1.6.0

getInputTableConfigs

public static Map<String,InputTableConfig> getInputTableConfigs(Class<?> implementingClass,
                                                                org.apache.hadoop.conf.Configuration conf)

Returns all InputTableConfig objects associated with this job.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Returns:: all of the table query configs for the job
Since:: 1.6.0

getInputTableConfig

public static InputTableConfig getInputTableConfig(Class<?> implementingClass,
                                                   org.apache.hadoop.conf.Configuration conf,
                                                   String tableName)

Returns the InputTableConfig for the given table

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; tableName - the table name for which to fetch the table query config
Returns:: the table query config for the given table name (if it exists) and null if it does not
Since:: 1.6.0

getTabletLocator

public static TabletLocator getTabletLocator(Class<?> implementingClass,
                                             org.apache.hadoop.conf.Configuration conf,
                                             String tableId)
                                      throws TableNotFoundException

Initializes an Accumulo TabletLocator based on the configuration.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure; tableId - The table id for which to initialize the TabletLocator
Returns:: an Accumulo tablet locator
Throws:: TableNotFoundException - if the table name set on the configuration doesn't exist
Since:: 1.6.0

validateOptions

public static void validateOptions(Class<?> implementingClass,
                                   org.apache.hadoop.conf.Configuration conf)
                            throws IOException

Check whether a configuration is fully configured to be used with an Accumulo InputFormat.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop configuration object to configure
Throws:: IOException - if the context is improperly configured
Since:: 1.6.0

getDefaultInputTableConfig

protected static Map.Entry<String,InputTableConfig> getDefaultInputTableConfig(Class<?> implementingClass,
                                                                               org.apache.hadoop.conf.Configuration conf)

Returns the InputTableConfig for the configuration based on the properties set using the single-table input methods.

Parameters:: implementingClass - the class whose name will be used as a prefix for the property configuration key; conf - the Hadoop instance for which to retrieve the configuration
Returns:: the config object built from the single input table properties set on the job
Since:: 1.6.0

binOffline

public static Map<String,Map<KeyExtent,List<Range>>> binOffline(String tableId,
                                                                List<Range> ranges,
                                                                Instance instance,
                                                                Connector conn)
                                                         throws AccumuloException,
                                                                TableNotFoundException

Throws:: AccumuloException; TableNotFoundException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce.lib.impl Class InputConfigurator

InputConfigurator

setInputTableName

getInputTableName

setScanAuthorizations

getScanAuthorizations

setRanges

getRanges

getIterators

fetchColumns

serializeColumns

getFetchedColumns

deserializeFetchedColumns

addIterator

setAutoAdjustRanges

getAutoAdjustRanges

setScanIsolation

isIsolated

setLocalIterators

usesLocalIterators

setOfflineTableScan

isOfflineScan

setInputTableConfigs

getInputTableConfigs

getInputTableConfig

getTabletLocator

validateOptions

getDefaultInputTableConfig

binOffline

org.apache.accumulo.core.client.mapreduce.lib.impl
Class InputConfigurator