Interface Sampler

  • All Known Implementing Classes:
    AbstractHashSampler, RowColumnSampler, RowSampler

    public interface Sampler
    A function that decides which key values are stored in a tables sample. As Accumuo compacts data and creates rfiles it uses a Sampler to decided what to store in the rfiles sample section. The class name of the Sampler and the Samplers configuration are stored in each rfile. A scan of a tables sample will only succeed if all rfiles were created with the same sampler and sampler configuration.

    Since the decisions that Sampler makes are persisted, the behavior of a Sampler for a given configuration should always be the same. One way to offer a new behavior is to offer new options, while still supporting old behavior with a Samplers existing options.

    Ideally a sampler that selects a Key k1 would also select updates for k1. For example if a Sampler selects : row='000989' family='name' qualifier='last' visibility='ADMIN' time=9 value='Doe', it would be nice if it also selected : row='000989' family='name' qualifier='last' visibility='ADMIN' time=20 value='Dough'. Using hash and modulo on the key fields is a good way to accomplish this and AbstractHashSampler provides a good basis for implementation.

    Since:
    1.8.0
    • Method Detail

      • init

        void init​(SamplerConfiguration config)
        An implementation of Sampler must have a noarg constructor. After construction this method is called once to initialize a sampler before it is used.
        Parameters:
        config - Configuration options for a sampler.
      • accept

        boolean accept​(Key k)
        Parameters:
        k - A key that was written to a rfile.
        Returns:
        True if the key (and its associtated value) should be stored in the rfile's sample. Return false if it should not be included.