- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<CellName,Cell>>
public class ColumnFamilyInputFormat
extends AbstractColumnFamilyInputFormat<java.nio.ByteBuffer,java.util.SortedMap<CellName,Cell>>
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily.
At minimum, you need to set the CF and predicate (description of columns to extract from each row)
in your Hadoop job Configuration. The ConfigHelper class is provided to make this
simple:
ConfigHelper.setInputColumnFamily
ConfigHelper.setInputSlicePredicate
You can also configure the number of rows per InputSplit with
ConfigHelper.setInputSplitSize
This should be "as big as possible, but no bigger." Each InputSplit is read from Cassandra
with multiple get_slice_range queries, and the per-call overhead of get_slice_range is high,
so larger split sizes are better -- but if it is too large, you will run out of memory.
The default split size is 64k rows.