public abstract class TransformingIterator extends WrappingIterator implements OptionDescriber
If the implementing iterator is transforming column families, then it must also override
untransformColumnFamilies(Collection)
to handle the case when column families are fetched
at scan time. The fetched column families will/must be in the transformed space, and the
untransformed column families need to be passed to this iterator's source. If it is not possible
to write a reverse transformation (e.g., the column family transformation depends on the row
value or something like that), then the iterator must not fetch specific column families (or only
fetch column families that are known to not transform at all).
If the implementing iterator is transforming column visibilities, then users must be careful NOT to fetch column qualifiers from the scanner. The reason for this is due to ACCUMULO-??? (insert issue number).
If the implementing iterator is transforming column visibilities, then the user should be sure to
supply authorizations via the AUTH_OPT
iterator option (note that this is only necessary
for scan scope iterators). The supplied authorizations should be in the transformed space, but
the authorizations supplied to the scanner should be in the untransformed space. That is, if the
iterator transforms A to 1, B to 2, C to 3, etc, then the auths supplied when the scanner is
constructed should be A,B,C,... and the auths supplied to the iterator should be 1,2,3,... The
reason for this is that the scanner performs security filtering before this iterator is called,
so the authorizations need to be in the original untransformed space. Since the iterator can
transform visibilities, it is possible that it could produce visibilities that the user cannot
see, so the transformed keys must be tested to ensure the user is allowed to view them. Note that
this test is not necessary when the iterator is not used in the scan scope since no security
filtering is performed during major and minor compactions. It should also be noted that this
iterator implements the security filtering rather than relying on a follow-on iterator to do it
so that we ensure the test is performed.
Modifier and Type | Class and Description |
---|---|
static interface |
TransformingIterator.KVBuffer |
OptionDescriber.IteratorOptions
Modifier and Type | Field and Description |
---|---|
static String |
AUTH_OPT |
protected int |
keyPos |
protected ArrayList<Pair<Key,Value>> |
keys |
protected org.slf4j.Logger |
log |
static String |
MAX_BUFFER_SIZE_OPT |
protected boolean |
scanning |
protected Collection<ByteSequence> |
seekColumnFamilies |
protected boolean |
seekColumnFamiliesInclusive |
protected Range |
seekRange |
Constructor and Description |
---|
TransformingIterator() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
canSee(Key key)
Indicates whether or not the user is able to see
key . |
protected boolean |
canSeeColumnFamily(Key key)
Indicates whether or not
key can be seen, according to the fetched column families for
this iterator. |
protected Range |
computeReseekRange(Range range)
Possibly expand
range to include everything for the key prefix we are working with. |
protected Key |
copyPartialKey(Key key,
PartialKey part)
Creates a copy of
key , copying only the parts of the key specified in part . |
SortedKeyValueIterator<Key,Value> |
deepCopy(IteratorEnvironment env)
Creates a deep copy of this iterator as though seek had not yet been called.
|
OptionDescriber.IteratorOptions |
describeOptions()
Gets an iterator options object that contains information needed to configure this iterator.
|
protected abstract PartialKey |
getKeyPrefix()
Indicates the prefix of keys that will be transformed by this iterator.
|
Key |
getTopKey()
Returns top key.
|
Value |
getTopValue()
Returns top value.
|
boolean |
hasTop()
Returns true if the iterator has more elements.
|
protected boolean |
includeTransformedKey(Key transformedKey)
Determines whether or not to include
transformedKey in the output. |
void |
init(SortedKeyValueIterator<Key,Value> source,
Map<String,String> options,
IteratorEnvironment env)
Initializes the iterator.
|
protected boolean |
isSetAfterPart(Key key,
PartialKey part)
Indicates whether or not any part of
key excluding part is set. |
void |
next()
Advances to the next K,V pair.
|
protected Key |
replaceColumnFamily(Key originalKey,
org.apache.hadoop.io.Text newColFam)
Make a new key with all parts (including delete flag) coming from
originalKey but use
newColFam as the column family. |
protected Key |
replaceColumnQualifier(Key originalKey,
org.apache.hadoop.io.Text newColQual)
Make a new key with all parts (including delete flag) coming from
originalKey but use
newColQual as the column qualifier. |
protected Key |
replaceColumnVisibility(Key originalKey,
org.apache.hadoop.io.Text newColVis)
Make a new key with all parts (including delete flag) coming from
originalKey but use
newColVis as the column visibility. |
protected Key |
replaceKeyParts(Key originalKey,
org.apache.hadoop.io.Text newColQual,
org.apache.hadoop.io.Text newColVis)
Make a new key with a column qualifier, and column visibility.
|
protected Key |
replaceKeyParts(Key originalKey,
org.apache.hadoop.io.Text newColFam,
org.apache.hadoop.io.Text newColQual,
org.apache.hadoop.io.Text newColVis)
Make a new key with a column family, column qualifier, and column visibility.
|
void |
seek(Range range,
Collection<ByteSequence> columnFamilies,
boolean inclusive)
Seeks to the first key in the Range, restricting the resulting K,V pairs to those with the
specified columns.
|
static void |
setAuthorizations(IteratorSetting config,
Authorizations auths)
Configure authorizations used for post transformation filtering.
|
static void |
setMaxBufferSize(IteratorSetting config,
long maxBufferSize)
Configure the maximum amount of memory that can be used for transformation.
|
protected void |
transformKeys()
Reads all keys matching the first key's prefix from the source iterator, transforms them, and
sorts the resulting keys.
|
protected abstract void |
transformRange(SortedKeyValueIterator<Key,Value> input,
TransformingIterator.KVBuffer output)
Transforms
input . |
protected Collection<ByteSequence> |
untransformColumnFamilies(Collection<ByteSequence> columnFamilies)
Reverses the transformation applied to column families that are fetched at seek time.
|
boolean |
validateOptions(Map<String,String> options)
Check to see if an options map contains all options required by an iterator and that the option
values are in the expected formats.
|
getSource, setSource
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
enableYielding
public static final String AUTH_OPT
public static final String MAX_BUFFER_SIZE_OPT
protected org.slf4j.Logger log
protected int keyPos
protected boolean scanning
protected Range seekRange
protected Collection<ByteSequence> seekColumnFamilies
protected boolean seekColumnFamiliesInclusive
public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException
SortedKeyValueIterator
init
in interface SortedKeyValueIterator<Key,Value>
init
in class WrappingIterator
source
- SortedKeyValueIterator source to read data from.options
- Map map of string option names to option values.env
- IteratorEnvironment environment in which iterator is being run.IOException
- unused.public OptionDescriber.IteratorOptions describeOptions()
OptionDescriber
describeOptions
in interface OptionDescriber
public boolean validateOptions(Map<String,String> options)
OptionDescriber
validateOptions
in interface OptionDescriber
options
- a map of option names to option valuespublic SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env)
SortedKeyValueIterator
deepCopy
in interface SortedKeyValueIterator<Key,Value>
deepCopy
in class WrappingIterator
env
- IteratorEnvironment environment in which iterator is being run.public boolean hasTop()
SortedKeyValueIterator
hasTop
in interface SortedKeyValueIterator<Key,Value>
hasTop
in class WrappingIterator
public Key getTopKey()
SortedKeyValueIterator
For performance reasons, iterators reserve the right to reuse objects returned by
getTopKey when SortedKeyValueIterator.next()
is called, changing the data that the object
references. Iterators that need to save an object returned by getTopKey ought to copy
the object's data into a new object in order to avoid aliasing bugs.
getTopKey
in interface SortedKeyValueIterator<Key,Value>
getTopKey
in class WrappingIterator
public Value getTopValue()
SortedKeyValueIterator
For performance reasons, iterators reserve the right to reuse objects returned by
getTopValue when SortedKeyValueIterator.next()
is called, changing the underlying data that the
object references. Iterators that need to save an object returned by getTopValue ought
to copy the object's data into a new object in order to avoid aliasing bugs.
getTopValue
in interface SortedKeyValueIterator<Key,Value>
getTopValue
in class WrappingIterator
public void next() throws IOException
SortedKeyValueIterator
next
in interface SortedKeyValueIterator<Key,Value>
next
in class WrappingIterator
IOException
- if an I/O error occurs.public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException
SortedKeyValueIterator
SortedKeyValueIterator.init(org.apache.accumulo.core.iterators.SortedKeyValueIterator<K, V>, java.util.Map<java.lang.String, java.lang.String>, org.apache.accumulo.core.iterators.IteratorEnvironment)
is called.
Iterators that examine groups of adjacent key/value pairs (e.g. rows) to determine their top
key and value should be sure that they properly handle a seek to a key in the middle of such a
group (e.g. the middle of a row). Even if the client always seeks to a range containing an
entire group (a,c), the tablet server could send back a batch of entries corresponding to
(a,b], then reseek the iterator to range (b,c) when the scan is continued.
columnFamilies
is used, at the lowest level, to determine which data blocks inside of
an RFile need to be opened for this iterator. This set of data blocks is also the set of
locality groups defined for the given table. If no columnFamilies are provided, the data blocks
for all locality groups inside of the correct RFile will be opened and seeked in an attempt to
find the correct start key, regardless of the startKey in the range
.
In an Accumulo instance in which multiple locality groups exist for a table, it is important to
ensure that columnFamilies
is properly set to the minimum required column families to
ensure that data from separate locality groups is not inadvertently read.seek
in interface SortedKeyValueIterator<Key,Value>
seek
in class WrappingIterator
range
- Range of keys to iterate over.columnFamilies
- Collection of column families to include or exclude.inclusive
- boolean that indicates whether to include (true) or exclude (false) column
families.IOException
- if an I/O error occurs.protected void transformKeys() throws IOException
IOException
protected boolean includeTransformedKey(Key transformedKey)
transformedKey
in the output. It is possible that
transformation could have produced a key that falls outside of the seek range, a key with a
visibility the user can't see, a key with a visibility that doesn't parse, or a key with a
column family that wasn't fetched. We only do some checks (outside the range, user can see) if
we're scanning. The range check is not done for major/minor compaction since seek ranges won't
be in our transformed key space and we will never change the row so we can't produce keys that
would fall outside the tablet anyway.transformedKey
- the key to checktrue
if the key should be included and false
if notprotected boolean canSee(Key key)
key
. If the user has not supplied
authorizations, or the iterator is not in the scan scope, then this method simply returns
true
. Otherwise, key
's column visibility is tested against the user-supplied
authorizations, and the test result is returned. For performance, the test results are cached
so that the same visibility is not tested multiple times.key
- the key to testtrue
if the key is visible or iterator is not scanning, and false
if
notprotected boolean canSeeColumnFamily(Key key)
key
can be seen, according to the fetched column families for
this iterator.key
- the key whose column family is to be testedtrue
if key
's column family is one of those fetched in the set passed
to our seek(Range, Collection, boolean)
methodprotected Range computeReseekRange(Range range)
range
to include everything for the key prefix we are working with.
That is, if our prefix is ROW_COLFAM, then we need to expand the range so we're sure to include
all entries having the same row and column family as the start/end of the range.range
- the range to expandprotected boolean isSetAfterPart(Key key, PartialKey part)
key
excluding part
is set. For example, if
part is ROW_COLFAM_COLQUAL, then this method determines whether or not the column visibility,
timestamp, or delete flag is set on key
.key
- the key to checkpart
- the part of the key that doesn't need to be checked (everything after does)true
if anything after part
is set on key
, and false
if
notprotected Key copyPartialKey(Key key, PartialKey part)
key
, copying only the parts of the key specified in part
. For
example, if part
is ROW_COLFAM_COLQUAL, then this method would copy the row, column
family, and column qualifier from key
into a new key.key
- the key to copypart
- the parts of key
to copypart
of key
protected Key replaceColumnFamily(Key originalKey, org.apache.hadoop.io.Text newColFam)
originalKey
but use
newColFam
as the column family.protected Key replaceColumnQualifier(Key originalKey, org.apache.hadoop.io.Text newColQual)
originalKey
but use
newColQual
as the column qualifier.protected Key replaceColumnVisibility(Key originalKey, org.apache.hadoop.io.Text newColVis)
originalKey
but use
newColVis
as the column visibility.protected Key replaceKeyParts(Key originalKey, org.apache.hadoop.io.Text newColFam, org.apache.hadoop.io.Text newColQual, org.apache.hadoop.io.Text newColVis)
originalKey
.protected Key replaceKeyParts(Key originalKey, org.apache.hadoop.io.Text newColQual, org.apache.hadoop.io.Text newColVis)
originalKey
.protected Collection<ByteSequence> untransformColumnFamilies(Collection<ByteSequence> columnFamilies)
columnFamilies
- the column families that have been fetched at seek timecolumnFamilies
protected abstract PartialKey getKeyPrefix()
transformKeys()
may be changing the column qualifier,
column visibility, or timestamp, but it won't be changing the row or column family.protected abstract void transformRange(SortedKeyValueIterator<Key,Value> input, TransformingIterator.KVBuffer output) throws IOException
input
. This method must not change the row part of the key, and must only
change the parts of the key after the return value of getKeyPrefix()
. Implementors
must also remember to copy the delete flag from originalKey
onto the new key. Or,
implementors should use one of the helper methods to produce the new key. See any of the
replaceKeyParts methods.input
- An iterator over a group of keys with the same prefix. This iterator provides an
efficient view, bounded by the prefix, of the underlying iterator and can not be
seeked.output
- An output buffer that holds transformed key values. All key values added to the buffer
must have the same prefix as the input keys.IOException
replaceColumnFamily(Key, Text)
,
replaceColumnQualifier(Key, Text)
,
replaceColumnVisibility(Key, Text)
,
replaceKeyParts(Key, Text, Text)
,
replaceKeyParts(Key, Text, Text, Text)
public static void setAuthorizations(IteratorSetting config, Authorizations auths)
public static void setMaxBufferSize(IteratorSetting config, long maxBufferSize)
maxBufferSize
- size in bytesCopyright © 2011–2018 The Apache Software Foundation. All rights reserved.