Class AccumuloRowInputFormat


  • public class AccumuloRowInputFormat
    extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,​PeekingIterator<Map.Entry<Key,​Value>>>
    This class allows MapReduce jobs to use Accumulo as the source of data. This InputFormat provides row names as Text as keys, and a corresponding PeekingIterator as a value, which in turn makes the Key/Value pairs for that row available to the Map function. Configure the job using the configure() method, which provides a fluent API. For Example:
     AccumuloRowInputFormat.configure().clientProperties(props).table(name) // required
         .auths(auths).addIterator(iter1).ranges(ranges).fetchColumns(columns).executionHints(hints)
         .samplerConfiguration(sampleConf).autoAdjustRanges(false) // enabled by default
         .scanIsolation(true) // not available with batchScan()
         .offlineScan(true) // not available with batchScan()
         .store(job);
     
    For descriptions of all options see InputFormatBuilder.InputFormatOptions
    Since:
    2.0
    • Constructor Detail

      • AccumuloRowInputFormat

        public AccumuloRowInputFormat()
    • Method Detail

      • createRecordReader

        public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,​PeekingIterator<Map.Entry<Key,​Value>>> createRecordReader​(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext context)
        Specified by:
        createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,​PeekingIterator<Map.Entry<Key,​Value>>>
      • getSplits

        public List<org.apache.hadoop.mapreduce.InputSplit> getSplits​(org.apache.hadoop.mapreduce.JobContext context)
                                                               throws IOException
        Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.
        Specified by:
        getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,​PeekingIterator<Map.Entry<Key,​Value>>>
        Returns:
        the splits from the tables based on the ranges.
        Throws:
        IOException - if a table set on the job doesn't exist or an error occurs initializing the tablet locator
      • configure

        public static InputFormatBuilder.ClientParams<org.apache.hadoop.mapreduce.Job> configure()
        Sets all the information required for this map reduce job.