Class InputConfigurator

java.lang.Object
org.apache.accumulo.hadoopImpl.mapreduce.lib.ConfiguratorBase
org.apache.accumulo.hadoopImpl.mapreduce.lib.InputConfigurator

public class InputConfigurator extends ConfiguratorBase
Since:
1.6.0
  • Constructor Details

    • InputConfigurator

      public InputConfigurator()
  • Method Details

    • setClassLoaderContext

      public static void setClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String context)
      Sets the name of the context classloader to use for scans
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      context - the name of the context classloader
      Since:
      1.8.0
    • getClassLoaderContext

      public static String getClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Gets the name of the context classloader to use for scans
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      the classloader context name
      Since:
      1.8.0
    • setInputTableName

      public static void setInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
      Sets the name of the input table, over which this job will scan.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      tableName - the table to use when the tablename is null in the write call
      Since:
      1.6.0
    • getInputTableName

      public static String getInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Sets the name of the input table, over which this job will scan.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Since:
      1.6.0
    • setScanAuthorizations

      public static void setScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Authorizations auths)
      Sets the Authorizations used to scan. Must be a subset of the user's authorization. Defaults to the empty set.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      auths - the user's authorizations
      Since:
      1.6.0
    • getScanAuthorizations

      public static Authorizations getScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Gets the authorizations to set for the scans from the configuration.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      the Accumulo scan authorizations
      Since:
      1.6.0
      See Also:
    • setRanges

      public static void setRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Range> ranges)
      Sets the input ranges to scan on all input tables for this job. If not set, the entire table will be scanned.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      ranges - the ranges that will be mapped over
      Throws:
      IllegalArgumentException - if the ranges cannot be encoded into base 64
      Since:
      1.6.0
    • getRanges

      public static List<Range> getRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf) throws IOException
      Gets the ranges to scan over from a job.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      the ranges
      Throws:
      IOException - if the ranges have been encoded improperly
      Since:
      1.6.0
      See Also:
    • getIterators

      public static List<IteratorSetting> getIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.
      See Also:
    • fetchColumns

      public static void fetchColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<IteratorSetting.Column> columnFamilyColumnQualifierPairs)
      Restricts the columns that will be mapped over for the single input table on this job.
    • serializeColumns

      public static String[] serializeColumns(Collection<IteratorSetting.Column> columnFamilyColumnQualifierPairs)
    • getFetchedColumns

      public static Set<IteratorSetting.Column> getFetchedColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Gets the columns to be mapped over from this job.
      See Also:
    • deserializeFetchedColumns

      public static Set<IteratorSetting.Column> deserializeFetchedColumns(Collection<String> serialized)
    • writeIteratorsToConf

      public static void writeIteratorsToConf(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<IteratorSetting> iterators)
      Serialize the iterators to the hadoop configuration under one key.
    • setAutoAdjustRanges

      public static void setAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
      Controls the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. The default setting is enabled. *

      By default, this feature is enabled.

      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      enableFeature - the feature is enabled if true, disabled otherwise
      Since:
      1.6.0
      See Also:
    • getAutoAdjustRanges

      public static Boolean getAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Determines whether a configuration has auto-adjust ranges enabled.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      false if the feature is disabled, true otherwise
      Since:
      1.6.0
      See Also:
    • setScanIsolation

      public static void setScanIsolation(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
      Controls the use of the IsolatedScanner in this job.

      By default, this feature is disabled.

      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      enableFeature - the feature is enabled if true, disabled otherwise
      Since:
      1.6.0
    • isIsolated

      public static Boolean isIsolated(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Determines whether a configuration has isolation enabled.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      true if the feature is enabled, false otherwise
      Since:
      1.6.0
      See Also:
    • setLocalIterators

      public static void setLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
      Controls the use of the ClientSideIteratorScanner in this job. Enabling this feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.

      By default, this feature is disabled.

      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      enableFeature - the feature is enabled if true, disabled otherwise
      Since:
      1.6.0
    • usesLocalIterators

      public static Boolean usesLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Determines whether a configuration uses local iterators.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      true if the feature is enabled, false otherwise
      Since:
      1.6.0
      See Also:
    • setOfflineTableScan

      public static void setOfflineTableScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
      Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.

      To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.

      Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.

      One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.

      There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.

      By default, this feature is disabled.

      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      enableFeature - the feature is enabled if true, disabled otherwise
      Since:
      1.6.0
    • isOfflineScan

      public static Boolean isOfflineScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Determines whether a configuration has the offline table scan feature enabled.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      true if the feature is enabled, false otherwise
      Since:
      1.6.0
      See Also:
    • setBatchScan

      public static void setBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
      Controls the use of the BatchScanner in this job. Using this feature will group ranges by their source tablet per InputSplit and use BatchScanner to read them.

      By default, this feature is disabled.

      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      enableFeature - the feature is enabled if true, disabled otherwise
      Since:
      1.7.0
    • isBatchScan

      public static Boolean isBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Determines whether a configuration has the BatchScanner feature enabled.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      true if the feature is enabled, false otherwise
      Since:
      1.7.0
      See Also:
    • setConsistencyLevel

      public static void setConsistencyLevel(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, ScannerBase.ConsistencyLevel level)
      Set the ConsistencyLevel for the Accumulo scans that create the input data
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      level - the consistency level
      Since:
      2.1.0
    • getConsistencyLevel

      public static ScannerBase.ConsistencyLevel getConsistencyLevel(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Get the ConsistencyLevel for the Accumulo scans that create the input data
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      the consistency level
      Since:
      2.1.0
    • setInputTableConfigs

      public static void setInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Map<String,InputTableConfig> configs)
      Sets configurations for multiple tables at a time.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      configs - an array of InputTableConfig objects to associate with the job
      Since:
      1.6.0
    • getInputTableConfigs

      public static Map<String,InputTableConfig> getInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
      Returns all InputTableConfig objects associated with this job.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      Returns:
      all of the table query configs for the job
      Since:
      1.6.0
    • getInputTableConfig

      public static InputTableConfig getInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
      Returns the InputTableConfig for the given table
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      tableName - the table name for which to fetch the table query config
      Returns:
      the table query config for the given table name (if it exists) and null if it does not
      Since:
      1.6.0
    • getTabletLocator

      public static TabletLocator getTabletLocator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, TableId tableId)
      Initializes an Accumulo TabletLocator based on the configuration.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      tableId - The table id for which to initialize the TabletLocator
      Returns:
      an Accumulo tablet locator
      Since:
      1.6.0
    • validatePermissions

      public static void validatePermissions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, AccumuloClient client) throws IOException
      Validates that the user has permissions on the requested tables
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop configuration object to configure
      client - the Accumulo client
      Throws:
      IOException
      Since:
      1.7.0
    • getDefaultInputTableConfig

      protected static Map.Entry<String,InputTableConfig> getDefaultInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
      Returns the InputTableConfig for the configuration based on the properties set using the single-table input methods.
      Parameters:
      implementingClass - the class whose name will be used as a prefix for the property configuration key
      conf - the Hadoop instance for which to retrieve the configuration
      tableName - the table name for which to retrieve the configuration
      Returns:
      the config object built from the single input table properties set on the job
      Since:
      1.6.0
    • binOffline

      public static Map<String,Map<KeyExtent,List<Range>>> binOffline(TableId tableId, List<Range> ranges, ClientContext context) throws AccumuloException, TableNotFoundException
      Throws:
      AccumuloException
      TableNotFoundException
    • setSamplerConfiguration

      public static void setSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, SamplerConfiguration samplerConfig)
    • getSamplerConfiguration

      public static SamplerConfiguration getSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
    • setExecutionHints

      public static void setExecutionHints(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Map<String,String> hints)
    • getExecutionHints

      public static Map<String,String> getExecutionHints(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)