Interface InputFormatBuilder.InputFormatOptions<T>

    • Method Detail

      • fetchColumns

        InputFormatBuilder.InputFormatOptions<T> fetchColumns​(Collection<IteratorSetting.Column> fetchColumns)
        Restricts the columns that will be mapped over for this job for the default input table.
        Parameters:
        fetchColumns - a collection of IteratorSetting.Column objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is selected. An empty set is the default and is equivalent to scanning all columns.
      • addIterator

        InputFormatBuilder.InputFormatOptions<T> addIterator​(IteratorSetting cfg)
        Encode an iterator on the single input table for this job. It is safe to call this method multiple times. If an iterator is added with the same name, it will be overridden.
        Parameters:
        cfg - the configuration of the iterator
      • autoAdjustRanges

        InputFormatBuilder.InputFormatOptions<T> autoAdjustRanges​(boolean value)
        Disables the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. Disabling has no effect for batch scans at it will always automatically adjust ranges.

        By default, this feature is enabled.

        See Also:
        ranges(Collection)
      • localIterators

        InputFormatBuilder.InputFormatOptions<T> localIterators​(boolean value)
        Enables the use of the ClientSideIteratorScanner in this job. This feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.

        By default, this feature is disabled.

      • offlineScan

        InputFormatBuilder.InputFormatOptions<T> offlineScan​(boolean value)
        Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.

        To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.

        Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.

        One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.

        There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.

        By default, this feature is disabled.

      • batchScan

        InputFormatBuilder.InputFormatOptions<T> batchScan​(boolean value)
        Enables the use of the BatchScanner in this job. Using this feature will group Ranges by their source tablet, producing an InputSplit per tablet rather than per Range. This batching helps to reduce overhead when querying a large number of small ranges. (ex: when doing quad-tree decomposition for spatial queries)

        In order to achieve good locality of InputSplits this option always clips the input Ranges to tablet boundaries. This may result in one input Range contributing to several InputSplits.

        Note: calls to autoAdjustRanges(boolean) is ignored when BatchScan is enabled.

        This configuration is incompatible with:

        By default, this feature is disabled.