Class VisitorIterator

java.lang.Object
com.yahoo.documentapi.VisitorIterator

public class VisitorIterator extends Object

Enables transparent iteration of super/sub-buckets

Thread safety: safe for threads to hold their own iterators (no shared state), as long as they also hold the ProgressToken object associated with it. No two VisitorIterator instances may share the same progress token instance at the same time. Concurrent access to a single VisitorIterator instance is not safe and must be handled atomically by the caller.

Author:
vekterli
  • Method Details

    • getNext

      Returns:
      The pair [superbucket, progress] that specifies the next iterable bucket. When a superbucket is initially returned, the pair is equal to that of [superbucket, 0], as there has been no progress into its sub-buckets yet (if they exist). Precondition: hasNext() == true
    • hasNext

      public boolean hasNext()

      Check whether or not it is valid to call getNext() with the current iterator state.

      There exists a case wherein hasNext may return false before update(com.yahoo.document.BucketId, com.yahoo.document.BucketId) is called, but true afterwards. This happens when the set of pending buckets is empty, the bucket source is empty but the set of active buckets is not. A future progress update on any of the buckets in the active set may or may not make that bucket available to the pending set again. This must be handled explicitly by the caller by checking isDone() and ensuring that update(com.yahoo.document.BucketId, com.yahoo.document.BucketId) is called before retrying hasNext.

      This method will also return false if the number of distribution bits have changed and there are active buckets needing to be flushed before the iterator will allow new buckets to be handed out.

      Returns:
      Whether or not it is valid to call getNext() with the current iterator state.
    • isDone

      public boolean isDone()
      Check if the iterator is actually done
      Returns:
      true iff the bucket source is empty and there are no pending or active buckets in the progress token.
      See Also:
    • update

      public void update(com.yahoo.document.BucketId superbucket, com.yahoo.document.BucketId progress)

      Tell the iterator that we've finished processing up to and including progress. progress may be a sub-bucket or the invalid 0-bucket (in case the caller fails to process the bucket and must return it to the set of pending) or the special case BucketId(Integer.MAX_VALUE), the latter indicating to the iterator that traversal is complete for superbucket's tree. The null bucket should only be used if no non-null updates have yet been given for the superbucket.

      It is a requirement that each superbucket returned by getNext() must eventually result in 1-n update operations, where the last update operation has the special progress==super case.

      If the document selection used to create the iterator is unknown and there were active buckets at the time of a distribution bit state change, such a bucket passed to update() will be in an inconsistent state with regards to the number of bits it uses. For unfinished buckets, this is handled by splitting or merging it until it's consistent, depending on whether or not it had a lower or higher distribution bit count than that of the current system state. For finished buckets of a lower dist bit count, the amount of finished buckets in the ProgressToken is adjusted upwards to compensate for the fact that a bucket using fewer distribution bits actually covers more of the bucket space than the ones that are currently in use. For finished buckets of a higher dist bit count, the number of finished buckets is not increased at that point in time, since such a bucket doesn't actually cover an entire bucket with the current state.

      All this is done automatically and transparently to the caller once all active buckets have been updated.

      Parameters:
      superbucket - A valid bucket ID that has been retrieved earlier through getNext()
      progress - A bucket logically contained within super. Subsequent updates for the same superbucket must have progress be in an increasing order, where order is defined as the in-order traversal of the bucket split tree. May also be the null bucket if the superbucket has not seen any "proper" progress updates yet or the special case Integer.MAX_VALUE. Note that inconsistent splitting might actually see progress as containing super rather than vice versa, so this is explicitly allowed to pass by the code.
    • getRemainingBucketCount

      public long getRemainingBucketCount()
      Returns:
      The total number of iterable buckets that remain to be processed Note: currently includes all non-finished (i.e. active and pending buckets) as well
    • getBucketSource

      protected VisitorIterator.BucketSource getBucketSource()
      Returns:
      Internal bucket source instance. Do NOT modify!
    • getProgressToken

      public ProgressToken getProgressToken()
    • getDistributionBitCount

      public int getDistributionBitCount()
    • setDistributionBitCount

      public void setDistributionBitCount(int distBits)

      Set the distribution bit count for the iterator and the buckets it currently maintains and will return in the future.

      For document selections that result in a explicit set of buckets, this is essentially a no-op, so in such a case, disregard the rest of this text.

      Changing the number of distribution bits for an unknown document selection will effectively scale the bucket space that will be visited; each bit increase or decrease doubling or halving its size, respectively. When increasing, any pending buckets will be split to ensure the total bucket space covered remains the same. Correspondingly, when decreasing, any pending buckets will be merged appropriately.

      If there are buckets active at the time of the change, the actual bucket splitting/merging operations are kept on hold until all active buckets have been updated, at which point they will be automatically performed. The iterator will force such an update by not giving out any new or pending buckets until that happens.

      Note: when decreasing the number of distribution bits, there is a chance of losing superbucket progress in a bucket that is merged with another bucket, leading to potential duplicate results.

      Parameters:
      distBits - New system state distribution bit count
    • visitsAllBuckets

      public boolean visitsAllBuckets()
    • createFromDocumentSelection

      public static VisitorIterator createFromDocumentSelection(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress) throws com.yahoo.document.select.parser.ParseException
      Throws:
      com.yahoo.document.select.parser.ParseException
    • createFromDocumentSelection

      public static VisitorIterator createFromDocumentSelection(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress, int slices, int sliceId) throws com.yahoo.document.select.parser.ParseException
      Create a new VisitorIterator instance based on the given document selection string.
      Parameters:
      documentSelection - Document selection string used to create the VisitorIterator instance. Depending on the characteristics of the selection, the iterator may iterate over only a small subset of the buckets or every bucket in the system. Both cases will be handled efficiently.
      idFactory - BucketId factory specifying the number of distribution bits to use et al.
      progress - A unique ProgressToken instance which is used for maintaining the state of the iterator. Can not be shared with other iterator instances at the same time. If progress contains work done in an earlier iteration run, the iterator will pick up from where it left off
      Returns:
      A new VisitorIterator instance
      Throws:
      com.yahoo.document.select.parser.ParseException - if documentSelection fails to properly parse
    • createFromExplicitBucketSet

      public static VisitorIterator createFromExplicitBucketSet(Set<com.yahoo.document.BucketId> bucketsToVisit, int distributionBitCount, ProgressToken progress)
      Create a new VisitorIterator instance based on the given set of buckets. This is supported for internal use only, and is required by Synchronization. Use createFromDocumentSelection(java.lang.String, com.yahoo.document.BucketIdFactory, int, com.yahoo.documentapi.ProgressToken) instead for all normal purposes.
      Parameters:
      bucketsToVisit - The set of buckets that will be visited
      distributionBitCount - Number of distribution bits to use
      progress - A unique ProgressToken instance which is used for maintaining the state of the iterator. Can not be shared with other iterator instances at the same time. If progress contains work done in an earlier iteration run, the iterator will pick up from where it left off
      Returns:
      A new VisitorIterator instance