Interface BatchScanner

  • All Superinterfaces:
    AutoCloseable, Iterable<Map.Entry<Key,​Value>>, ScannerBase
    All Known Implementing Classes:
    TabletServerBatchDeleter, TabletServerBatchReader

    public interface BatchScanner
    extends ScannerBase
    In exchange for possibly returning scanned entries out of order, BatchScanner implementations may scan an Accumulo table more efficiently by
    • Looking up multiple ranges in parallel. Parallelism is constrained by the number of threads available to the BatchScanner, set in its constructor.
    • Breaking up large ranges into subranges. Often the number and boundaries of subranges are determined by a table's split points.
    • Combining multiple ranges into a single RPC call to a tablet server.
    The above techniques lead to better performance than a Scanner in use cases such as
    • Retrieving many small ranges
    • Scanning a large range that returns many entries
    • Running server-side iterators that perform computation, even if few entries are returned from the scan itself
    To re-emphasize, only use a BatchScanner when you do not care whether returned data is in sorted order. Use a Scanner instead when sorted order is important.

    A BatchScanner instance will use no more threads than provided in the construction of the BatchScanner implementation. Multiple invocations of iterator() will all share the same resources of the instance. A new BatchScanner instance should be created to use allocate additional threads.

    • Method Detail

      • setRanges

        void setRanges​(Collection<Range> ranges)
        Allows scanning over multiple ranges efficiently.
        Parameters:
        ranges - specifies the non-overlapping ranges to query
      • close

        void close()
        Description copied from interface: ScannerBase
        Closes any underlying connections on the scanner. This may invalidate any iterators derived from the Scanner, causing them to throw exceptions.
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface ScannerBase
      • setTimeout

        void setTimeout​(long timeout,
                        TimeUnit timeUnit)
        This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.

        Setting the timeout to zero (with any time unit) or Long.MAX_VALUE (with TimeUnit.MILLISECONDS) means no timeout.

        The batch scanner will accomplish as much work as possible before throwing an exception. BatchScanner iterators will throw a TimedOutException when all needed servers timeout.

        Specified by:
        setTimeout in interface ScannerBase
        Parameters:
        timeout - the length of the timeout
        timeUnit - the units of the timeout