public interface SamplerA function that decides which key values are stored in a tables sample. As Accumulo compacts data and creates rfiles it uses a Sampler to decided what to store in the rfiles sample section. The class name of the Sampler and the Samplers configuration are stored in each rfile. A scan of a tables sample will only succeed if all rfiles were created with the same sampler and sampler configuration.
Since the decisions that Sampler makes are persisted, the behavior of a Sampler for a given configuration should always be the same. One way to offer a new behavior is to offer new options, while still supporting old behavior with a Samplers existing options.
Ideally a sampler that selects a Key k1 would also select updates for k1. For example if a Sampler selects :
row='000989' family='name' qualifier='last' visibility='ADMIN' time=9 value='Doe', it would be nice if it also selected :
row='000989' family='name' qualifier='last' visibility='ADMIN' time=20 value='Dough'. Using hash and modulo on the key fields is a good way to accomplish this and
AbstractHashSamplerprovides a good basis for implementation.
void init(SamplerConfiguration config)An implementation of Sampler must have a noarg constructor. After construction this method is called once to initialize a sampler before it is used.
config- Configuration options for a sampler.
boolean accept(Key k)
k- A key that was written to a rfile.
- True if the key (and its associated value) should be stored in the rfile's sample. Return false if it should not be included.