Scan (HBase - Client 0.96.1.1-hadoop2 API)

java.lang.Object
- org.apache.hadoop.hbase.client.Operation
- - org.apache.hadoop.hbase.client.OperationWithAttributes
  - - org.apache.hadoop.hbase.client.Scan

All Implemented Interfaces:

Attributes
```
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Scan
extends OperationWithAttributes
```
Used to perform Scan operations.
All operations are identical to Get with the exception of instantiation. Rather than specifying a single row, an optional startRow and stopRow may be defined. If rows are not specified, the Scanner will iterate over all rows.
To scan everything for each row, instantiate a Scan object.
To modify scanner caching for just this scan, use setCaching. If caching is NOT set, we will use the caching value of the hosting HTable. See HTable.setScannerCaching(int). In addition to row caching, it is possible to specify a maximum result size, using setMaxResultSize(long). When both are used, single server requests are limited by either number of rows or maximum result size, whichever limit comes first.
To further define the scope of what to get when scanning, perform additional methods as outlined below.
To get all columns from specific families, execute addFamily for each family to retrieve.
To get specific columns, execute addColumn for each column to retrieve.
To only retrieve columns within a specific range of version timestamps, execute setTimeRange.
To only retrieve columns with a specific timestamp, execute setTimestamp.
To limit the number of versions of each column to be returned, execute setMaxVersions.
To limit the maximum number of values returned for each call to next(), execute setBatch.
To add a filter, execute setFilter.
Expert: To explicitly disable server-side block caching for this scan, execute setCacheBlocks(boolean).

Field Summary

Fields
Modifier and Type Field and Description

static String SCAN_ATTRIBUTES_METRICS_DATA

static String SCAN_ATTRIBUTES_METRICS_ENABLE

static String SCAN_ATTRIBUTES_TABLE_NAME
- Fields inherited from class org.apache.hadoop.hbase.client.OperationWithAttributes
  ID_ATRIBUTE

Fields
Modifier and Type	Field and Description
`static String`	`SCAN_ATTRIBUTES_METRICS_DATA`
`static String`	`SCAN_ATTRIBUTES_METRICS_ENABLE`
`static String`	`SCAN_ATTRIBUTES_TABLE_NAME`

Constructor Summary

Constructors
Constructor and Description
`Scan()` Create a Scan operation across all rows.
`Scan(byte[] startRow)` Create a Scan operation starting at the specified row.
`Scan(byte[] startRow, byte[] stopRow)` Create a Scan operation for the range of rows specified.
`Scan(byte[] startRow, Filter filter)`
`Scan(Get get)` Builds a scan object with the same specs as get.
`Scan(Scan scan)` Creates a new instance of this class while copying all values.

Method Summary

Methods
Modifier and Type	Method and Description
`Scan`	`addColumn(byte[] family, byte[] qualifier)` Get the column from the specified family with the specified qualifier.
`Scan`	`addFamily(byte[] family)` Get all columns from the specified family.
`boolean`	`doLoadColumnFamiliesOnDemand()` Get the logical value indicating whether on-demand CF loading should be allowed.
`int`	`getBatch()`
`boolean`	`getCacheBlocks()` Get whether blocks should be cached for this Scan.
`int`	`getCaching()`
`byte[][]`	`getFamilies()`
`Map<byte[],NavigableSet<byte[]>>`	`getFamilyMap()` Getting the familyMap
`Filter`	`getFilter()`
`Map<String,Object>`	`getFingerprint()` Compile the table and column family (i.e.
`IsolationLevel`	`getIsolationLevel()`
`Boolean`	`getLoadColumnFamiliesOnDemandValue()` Get the raw loadColumnFamiliesOnDemand setting; if it's not set, can be null.
`long`	`getMaxResultSize()`
`int`	`getMaxResultsPerColumnFamily()`
`int`	`getMaxVersions()`
`int`	`getRowOffsetPerColumnFamily()` Method for retrieving the scan's offset per row per column family (#kvs to be skipped)
`byte[]`	`getStartRow()`
`byte[]`	`getStopRow()`
`TimeRange`	`getTimeRange()`
`boolean`	`hasFamilies()`
`boolean`	`hasFilter()`
`boolean`	`isGetScan()`
`boolean`	`isRaw()`
`boolean`	`isSmall()` Get whether this scan is a small scan
`int`	`numFamilies()`
`void`	`setBatch(int batch)` Set the maximum number of values to return for each call to next()
`void`	`setCacheBlocks(boolean cacheBlocks)` Set whether blocks should be cached for this Scan.
`void`	`setCaching(int caching)` Set the number of rows for caching that will be passed to scanners.
`Scan`	`setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)` Setting the familyMap
`Scan`	`setFilter(Filter filter)` Apply the specified server-side filter when performing the Scan.
`void`	`setIsolationLevel(IsolationLevel level)`
`void`	`setLoadColumnFamiliesOnDemand(boolean value)` Set the value indicating whether loading CFs on demand should be allowed (cluster default is false).
`void`	`setMaxResultSize(long maxResultSize)` Set the maximum result size.
`void`	`setMaxResultsPerColumnFamily(int limit)` Set the maximum number of values to return per row per Column Family
`Scan`	`setMaxVersions()` Get all available versions.
`Scan`	`setMaxVersions(int maxVersions)` Get up to the specified number of versions of each column.
`void`	`setRaw(boolean raw)` Enable/disable "raw" mode for this scan.
`void`	`setRowOffsetPerColumnFamily(int offset)` Set offset for the row per Column Family.
`void`	`setSmall(boolean small)` Set whether this scan is a small scan
`Scan`	`setStartRow(byte[] startRow)` Set the start row of the scan.
`Scan`	`setStopRow(byte[] stopRow)` Set the stop row.
`Scan`	`setTimeRange(long minStamp, long maxStamp)` Get versions of columns only within the specified timestamp range, [minStamp, maxStamp).
`Scan`	`setTimeStamp(long timestamp)` Get versions of columns with the specified timestamp.
`Map<String,Object>`	`toMap(int maxCols)` Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information.

Methods inherited from class org.apache.hadoop.hbase.client.OperationWithAttributes
getAttribute, getAttributeSize, getAttributesMap, getId, setAttribute, setId

Methods inherited from class org.apache.hadoop.hbase.client.Operation
toJSON, toJSON, toMap, toString, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - SCAN_ATTRIBUTES_METRICS_ENABLE
```
public static final String SCAN_ATTRIBUTES_METRICS_ENABLE
```
    See Also:
    Constant Field Values
  - SCAN_ATTRIBUTES_METRICS_DATA
```
public static final String SCAN_ATTRIBUTES_METRICS_DATA
```
    See Also:
    Constant Field Values
  - SCAN_ATTRIBUTES_TABLE_NAME
```
public static final String SCAN_ATTRIBUTES_TABLE_NAME
```
    See Also:
    Constant Field Values
- Constructor Detail
  - Scan
```
public Scan()
```
    Create a Scan operation across all rows.
  - Scan
```
public Scan(byte[] startRow,
    Filter filter)
```
  - Scan
```
public Scan(byte[] startRow)
```
    Create a Scan operation starting at the specified row.
    If the specified row does not exist, the Scanner will start from the next closest row after the specified row.
    
    Parameters:
    startRow - row to start scanner at or after
  - Scan
```
public Scan(byte[] startRow,
    byte[] stopRow)
```
    Create a Scan operation for the range of rows specified.
    
    Parameters:
    startRow - row to start scanner at or after (inclusive)
    stopRow - row to stop scanner before (exclusive)
  - Scan
```
public Scan(Scan scan)
     throws IOException
```
    Creates a new instance of this class while copying all values.
    
    Parameters:
    scan - The scan instance to copy from.
    
    Throws:
    
    IOException - When copying the values fails.
  - Scan
```
public Scan(Get get)
```
    Builds a scan object with the same specs as get.
    
    Parameters:
    get - get to model scan after
- Method Detail
  - isGetScan
```
public boolean isGetScan()
```
  - addFamily
```
public Scan addFamily(byte[] family)
```
    Get all columns from the specified family.
    Overrides previous calls to addColumn for this family.
    
    Parameters:
    family - family name
    
    Returns:
    this
  - addColumn
```
public Scan addColumn(byte[] family,
             byte[] qualifier)
```
    Get the column from the specified family with the specified qualifier.
    Overrides previous calls to addFamily for this family.
    
    Parameters:
    family - family name
    qualifier - column qualifier
    
    Returns:
    this
  - setTimeRange
```
public Scan setTimeRange(long minStamp,
                long maxStamp)
                  throws IOException
```
    Get versions of columns only within the specified timestamp range, [minStamp, maxStamp). Note, default maximum versions to return is 1. If your time range spans more than one version and you want all versions returned, up the number of versions beyond the defaut.
    
    Parameters:
    minStamp - minimum timestamp value, inclusive
    maxStamp - maximum timestamp value, exclusive
    
    Returns:
    this
    
    Throws:
    
    IOException - if invalid time range
    See Also:
    setMaxVersions(), setMaxVersions(int)
  - setTimeStamp
```
public Scan setTimeStamp(long timestamp)
```
    Get versions of columns with the specified timestamp. Note, default maximum versions to return is 1. If your time range spans more than one version and you want all versions returned, up the number of versions beyond the defaut.
    
    Parameters:
    timestamp - version timestamp
    
    Returns:
    this
    See Also:
    setMaxVersions(), setMaxVersions(int)
  - setStartRow
```
public Scan setStartRow(byte[] startRow)
```
    Set the start row of the scan.
    
    Parameters:
    startRow - row to start scan on (inclusive) Note: In order to make startRow exclusive add a trailing 0 byte
    
    Returns:
    this
  - setStopRow
```
public Scan setStopRow(byte[] stopRow)
```
    Set the stop row.
    
    Parameters:
    stopRow - row to end at (exclusive) Note: In order to make stopRow inclusive add a trailing 0 byte
    
    Returns:
    this
  - setMaxVersions
```
public Scan setMaxVersions()
```
    Get all available versions.
    
    Returns:
    this
  - setMaxVersions
```
public Scan setMaxVersions(int maxVersions)
```
    Get up to the specified number of versions of each column.
    
    Parameters:
    maxVersions - maximum versions for each column
    
    Returns:
    this
  - setBatch
```
public void setBatch(int batch)
```
    Set the maximum number of values to return for each call to next()
    
    Parameters:
    batch - the maximum number of values
  - setMaxResultsPerColumnFamily
```
public void setMaxResultsPerColumnFamily(int limit)
```
    Set the maximum number of values to return per row per Column Family
    
    Parameters:
    limit - the maximum number of values returned / row / CF
  - setRowOffsetPerColumnFamily
```
public void setRowOffsetPerColumnFamily(int offset)
```
    Set offset for the row per Column Family.
    
    Parameters:
    offset - is the number of kvs that will be skipped.
  - setCaching
```
public void setCaching(int caching)
```
    Set the number of rows for caching that will be passed to scanners. If not set, the default setting from HTable.getScannerCaching() will apply. Higher caching values will enable faster scanners but will use more memory.
    
    Parameters:
    caching - the number of rows for caching
  - getMaxResultSize
```
public long getMaxResultSize()
```
    Returns:
    the maximum result size in bytes. See setMaxResultSize(long)
  - setMaxResultSize
```
public void setMaxResultSize(long maxResultSize)
```
    Set the maximum result size. The default is -1; this means that no specific maximum result size will be set for this scan, and the global configured value will be used instead. (Defaults to unlimited).
    
    Parameters:
    maxResultSize - The maximum result size in bytes.
  - setFilter
```
public Scan setFilter(Filter filter)
```
    Apply the specified server-side filter when performing the Scan.
    
    Parameters:
    filter - filter to run on the server
    
    Returns:
    this
  - setFamilyMap
```
public Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
```
    Setting the familyMap
    
    Parameters:
    familyMap - map of family to qualifier
    
    Returns:
    this
  - getFamilyMap
```
public Map<byte[],NavigableSet<byte[]>> getFamilyMap()
```
    Getting the familyMap
    
    Returns:
    familyMap
  - numFamilies
```
public int numFamilies()
```
    Returns:
    the number of families in familyMap
  - hasFamilies
```
public boolean hasFamilies()
```
    Returns:
    true if familyMap is non empty, false otherwise
  - getFamilies
```
public byte[][] getFamilies()
```
    Returns:
    the keys of the familyMap
  - getStartRow
```
public byte[] getStartRow()
```
    Returns:
    the startrow
  - getStopRow
```
public byte[] getStopRow()
```
    Returns:
    the stoprow
  - getMaxVersions
```
public int getMaxVersions()
```
    Returns:
    the max number of versions to fetch
  - getBatch
```
public int getBatch()
```
    Returns:
    maximum number of values to return for a single call to next()
  - getMaxResultsPerColumnFamily
```
public int getMaxResultsPerColumnFamily()
```
    Returns:
    maximum number of values to return per row per CF
  - getRowOffsetPerColumnFamily
```
public int getRowOffsetPerColumnFamily()
```
    Method for retrieving the scan's offset per row per column family (#kvs to be skipped)
    
    Returns:
    row offset
  - getCaching
```
public int getCaching()
```
    Returns:
    caching the number of rows fetched when calling next on a scanner
  - getTimeRange
```
public TimeRange getTimeRange()
```
    Returns:
    TimeRange
  - getFilter
```
public Filter getFilter()
```
    Returns:
    RowFilter
  - hasFilter
```
public boolean hasFilter()
```
    Returns:
    true is a filter has been specified, false if not
  - setCacheBlocks
```
public void setCacheBlocks(boolean cacheBlocks)
```
    Set whether blocks should be cached for this Scan.
    This is true by default. When true, default settings of the table and family are used (this will never override caching blocks if the block cache is disabled for that family or entirely).
    
    Parameters:
    cacheBlocks - if false, default settings are overridden and blocks will not be cached
  - getCacheBlocks
```
public boolean getCacheBlocks()
```
    Get whether blocks should be cached for this Scan.
    
    Returns:
    true if default caching should be used, false if blocks should not be cached
  - setLoadColumnFamiliesOnDemand
```
public void setLoadColumnFamiliesOnDemand(boolean value)
```
    Set the value indicating whether loading CFs on demand should be allowed (cluster default is false). On-demand CF loading doesn't load column families until necessary, e.g. if you filter on one column, the other column family data will be loaded only for the rows that are included in result, not all rows like in normal case. With column-specific filters, like SingleColumnValueFilter w/filterIfMissing == true, this can deliver huge perf gains when there's a cf with lots of data; however, it can also lead to some inconsistent results, as follows: - if someone does a concurrent update to both column families in question you may get a row that never existed, e.g. for { rowKey = 5, { cat_videos => 1 }, { video => "my cat" } } someone puts rowKey 5 with { cat_videos => 0 }, { video => "my dog" }, concurrent scan filtering on "cat_videos == 1" can get { rowKey = 5, { cat_videos => 1 }, { video => "my dog" } }. - if there's a concurrent split and you have more than 2 column families, some rows may be missing some column families.
  - getLoadColumnFamiliesOnDemandValue
```
public Boolean getLoadColumnFamiliesOnDemandValue()
```
    Get the raw loadColumnFamiliesOnDemand setting; if it's not set, can be null.
  - doLoadColumnFamiliesOnDemand
```
public boolean doLoadColumnFamiliesOnDemand()
```
    Get the logical value indicating whether on-demand CF loading should be allowed.
  - getFingerprint
```
public Map<String,Object> getFingerprint()
```
    Compile the table and column family (i.e. schema) information into a String. Useful for parsing and aggregation by debugging, logging, and administration tools.
    
    Specified by:
    
    getFingerprint in class Operation
    
    Returns:
    Map
  - toMap
```
public Map<String,Object> toMap(int maxCols)
```
    Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information. Useful for debugging, logging, and administration tools.
    
    Specified by:
    
    toMap in class Operation
    
    Parameters:
    maxCols - a limit on the number of columns output prior to truncation
    
    Returns:
    Map
  - setRaw
```
public void setRaw(boolean raw)
```
    Enable/disable "raw" mode for this scan. If "raw" is enabled the scan will return all delete marker and deleted rows that have not been collected, yet. This is mostly useful for Scan on column families that have KEEP_DELETED_ROWS enabled. It is an error to specify any column when "raw" is set.
    
    Parameters:
    raw - True/False to enable/disable "raw" mode.
  - isRaw
```
public boolean isRaw()
```
    Returns:
    True if this Scan is in "raw" mode.
  - setIsolationLevel
```
public void setIsolationLevel(IsolationLevel level)
```
  - getIsolationLevel
```
public IsolationLevel getIsolationLevel()
```
  - setSmall
```
public void setSmall(boolean small)
```
    Set whether this scan is a small scan
    Small scan should use pread and big scan can use seek + read seek + read is fast but can cause two problem (1) resource contention (2) cause too much network io [89-fb] Using pread for non-compaction read request https://issues.apache.org/jira/browse/HBASE-7266 On the other hand, if setting it true, we would do openScanner,next,closeScanner in one RPC call. It means the better performance for small scan. [HBASE-9488]. Generally, if the scan range is within one data block(64KB), it could be considered as a small scan.
    
    Parameters:
    small -
  - isSmall
```
public boolean isSmall()
```
    Get whether this scan is a small scan
    
    Returns:
    true if small scan

Class Scan

Field Summary

Fields inherited from class org.apache.hadoop.hbase.client.OperationWithAttributes

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hbase.client.OperationWithAttributes

Methods inherited from class org.apache.hadoop.hbase.client.Operation

Methods inherited from class java.lang.Object

Field Detail

SCAN_ATTRIBUTES_METRICS_ENABLE

SCAN_ATTRIBUTES_METRICS_DATA

SCAN_ATTRIBUTES_TABLE_NAME

Constructor Detail

Scan

Scan

Scan

Scan

Scan

Scan

Method Detail

isGetScan

addFamily

addColumn

setTimeRange

setTimeStamp

setStartRow

setStopRow

setMaxVersions

setMaxVersions

setBatch

setMaxResultsPerColumnFamily

setRowOffsetPerColumnFamily

setCaching

getMaxResultSize

setMaxResultSize

setFilter

setFamilyMap

getFamilyMap

numFamilies

hasFamilies

getFamilies

getStartRow

getStopRow

getMaxVersions

getBatch

getMaxResultsPerColumnFamily

getRowOffsetPerColumnFamily

getCaching

getTimeRange

getFilter

hasFilter

setCacheBlocks

getCacheBlocks

setLoadColumnFamiliesOnDemand

getLoadColumnFamiliesOnDemandValue

doLoadColumnFamiliesOnDemand

getFingerprint

toMap

setRaw

isRaw

setIsolationLevel

getIsolationLevel

setSmall

isSmall