public class InputFormatBuilderImpl<T> extends Object implements InputFormatBuilder, InputFormatBuilder.ClientParams<T>, InputFormatBuilder.TableParams<T>, InputFormatBuilder.InputFormatOptions<T>
InputFormatBuilder.ClientParams<T>, InputFormatBuilder.InputFormatOptions<T>, InputFormatBuilder.TableParams<T>
Constructor and Description |
---|
InputFormatBuilderImpl(Class<?> callingClass) |
Modifier and Type | Method and Description |
---|---|
InputFormatBuilder.InputFormatOptions<T> |
addIterator(IteratorSetting cfg)
Encode an iterator on the single input table for this job.
|
InputFormatBuilder.InputFormatOptions<T> |
auths(Authorizations auths)
Sets the
Authorizations used to scan. |
InputFormatBuilder.InputFormatOptions<T> |
autoAdjustRanges(boolean value)
Disables the automatic adjustment of ranges for this job.
|
InputFormatBuilder.InputFormatOptions<T> |
batchScan(boolean value)
Enables the use of the
BatchScanner in this job. |
InputFormatBuilder.InputFormatOptions<T> |
classLoaderContext(String context)
Sets the name of the classloader context on this scanner
|
InputFormatBuilder.TableParams<T> |
clientProperties(Properties clientProperties)
Set client properties needed to communicate with Accumulo for this job.
|
InputFormatBuilder.TableParams<T> |
clientPropertiesPath(String clientPropsPath)
Set path to DFS location containing accumulo-client.properties file.
|
InputFormatBuilder.InputFormatOptions<T> |
executionHints(Map<String,String> hints)
Set these execution hints on scanners created for input splits.
|
InputFormatBuilder.InputFormatOptions<T> |
fetchColumns(Collection<IteratorSetting.Column> fetchColumns)
Restricts the columns that will be mapped over for this job for the default input table.
|
InputFormatBuilder.InputFormatOptions<T> |
localIterators(boolean value)
Enables the use of the
ClientSideIteratorScanner in this job. |
InputFormatBuilder.InputFormatOptions<T> |
offlineScan(boolean value)
Enable reading offline tables.
|
InputFormatBuilder.InputFormatOptions<T> |
ranges(Collection<Range> ranges)
Sets the input ranges to scan for the single input table associated with this job.
|
InputFormatBuilder.InputFormatOptions<T> |
samplerConfiguration(SamplerConfiguration samplerConfig)
Causes input format to read sample data.
|
InputFormatBuilder.InputFormatOptions<T> |
scanIsolation(boolean value)
Enables the use of the
IsolatedScanner in this job. |
void |
store(T j)
Finish configuring, verify and serialize options into the JobConf or Job
|
InputFormatBuilder.InputFormatOptions<T> |
table(String tableName)
Sets the name of the input table, over which this job will scan.
|
public InputFormatBuilderImpl(Class<?> callingClass)
public InputFormatBuilder.TableParams<T> clientProperties(Properties clientProperties)
InputFormatBuilder.ClientParams
InputFormatBuilder.ClientParams.clientPropertiesPath(String)
. Client properties can be created using
Accumulo.newClientProperties()
clientProperties
in interface InputFormatBuilder.ClientParams<T>
clientProperties
- Accumulo connection informationpublic InputFormatBuilder.TableParams<T> clientPropertiesPath(String clientPropsPath)
InputFormatBuilder.ClientParams
InputFormatBuilder.ClientParams.clientProperties(Properties)
clientPropertiesPath
in interface InputFormatBuilder.ClientParams<T>
clientPropsPath
- DFS path to accumulo-client.propertiespublic InputFormatBuilder.InputFormatOptions<T> table(String tableName)
InputFormatBuilder.TableParams
table
in interface InputFormatBuilder.TableParams<T>
tableName
- the table to use when the tablename is null in the write callpublic InputFormatBuilder.InputFormatOptions<T> auths(Authorizations auths)
InputFormatBuilder.InputFormatOptions
Authorizations
used to scan. Must be a subset of the user's authorizations.
By Default, all of the users auths are set.auths
in interface InputFormatBuilder.InputFormatOptions<T>
auths
- the user's authorizationspublic InputFormatBuilder.InputFormatOptions<T> classLoaderContext(String context)
InputFormatBuilder.InputFormatOptions
classLoaderContext
in interface InputFormatBuilder.InputFormatOptions<T>
context
- name of the classloader contextpublic InputFormatBuilder.InputFormatOptions<T> ranges(Collection<Range> ranges)
InputFormatBuilder.InputFormatOptions
ranges
in interface InputFormatBuilder.InputFormatOptions<T>
ranges
- the ranges that will be mapped overTableOperations.splitRangeByTablets(String, Range, int)
public InputFormatBuilder.InputFormatOptions<T> fetchColumns(Collection<IteratorSetting.Column> fetchColumns)
InputFormatBuilder.InputFormatOptions
fetchColumns
in interface InputFormatBuilder.InputFormatOptions<T>
fetchColumns
- a collection of IteratorSetting.Column objects corresponding to column family and
column qualifier. If the column qualifier is null, the entire column family is
selected. An empty set is the default and is equivalent to scanning all columns.public InputFormatBuilder.InputFormatOptions<T> addIterator(IteratorSetting cfg)
InputFormatBuilder.InputFormatOptions
addIterator
in interface InputFormatBuilder.InputFormatOptions<T>
cfg
- the configuration of the iteratorpublic InputFormatBuilder.InputFormatOptions<T> executionHints(Map<String,String> hints)
InputFormatBuilder.InputFormatOptions
ScannerBase.setExecutionHints(java.util.Map)
executionHints
in interface InputFormatBuilder.InputFormatOptions<T>
public InputFormatBuilder.InputFormatOptions<T> samplerConfiguration(SamplerConfiguration samplerConfig)
InputFormatBuilder.InputFormatOptions
samplerConfiguration
in interface InputFormatBuilder.InputFormatOptions<T>
samplerConfig
- The sampler configuration that sample must have been created with inorder for
reading sample data to succeed.ScannerBase.setSamplerConfiguration(SamplerConfiguration)
public InputFormatBuilder.InputFormatOptions<T> autoAdjustRanges(boolean value)
InputFormatBuilder.InputFormatOptions
By default, this feature is enabled.
autoAdjustRanges
in interface InputFormatBuilder.InputFormatOptions<T>
InputFormatBuilder.InputFormatOptions.ranges(Collection)
public InputFormatBuilder.InputFormatOptions<T> scanIsolation(boolean value)
InputFormatBuilder.InputFormatOptions
IsolatedScanner
in this job.
By default, this feature is disabled.
scanIsolation
in interface InputFormatBuilder.InputFormatOptions<T>
public InputFormatBuilder.InputFormatOptions<T> localIterators(boolean value)
InputFormatBuilder.InputFormatOptions
ClientSideIteratorScanner
in this job. This feature will cause
the iterator stack to be constructed within the Map task, rather than within the Accumulo
TServer. To use this feature, all classes needed for those iterators must be available on the
classpath for the task.
By default, this feature is disabled.
localIterators
in interface InputFormatBuilder.InputFormatOptions<T>
public InputFormatBuilder.InputFormatOptions<T> offlineScan(boolean value)
InputFormatBuilder.InputFormatOptions
To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.
Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.
One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.
There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.
By default, this feature is disabled.
offlineScan
in interface InputFormatBuilder.InputFormatOptions<T>
public InputFormatBuilder.InputFormatOptions<T> batchScan(boolean value)
InputFormatBuilder.InputFormatOptions
BatchScanner
in this job.
Using this feature will group Ranges by their source tablet, producing an InputSplit per
tablet rather than per Range. This batching helps to reduce overhead when querying a large
number of small ranges. (ex: when doing quad-tree decomposition for spatial queries)
In order to achieve good locality of InputSplits this option always clips the input Ranges to tablet boundaries. This may result in one input Range contributing to several InputSplits.
Note: calls to InputFormatBuilder.InputFormatOptions.autoAdjustRanges(boolean)
is ignored when BatchScan is enabled.
This configuration is incompatible with:
InputFormatBuilder.InputFormatOptions.offlineScan(boolean)
InputFormatBuilder.InputFormatOptions.localIterators(boolean)
InputFormatBuilder.InputFormatOptions.scanIsolation(boolean)
By default, this feature is disabled.
batchScan
in interface InputFormatBuilder.InputFormatOptions<T>
public void store(T j) throws AccumuloException, AccumuloSecurityException
InputFormatBuilder.TableParams
store
in interface InputFormatBuilder.TableParams<T>
AccumuloException
AccumuloSecurityException
Copyright © 2011–2019 The Apache Software Foundation. All rights reserved.