- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
public class CqlInputFormat
extends AbstractColumnFamilyInputFormat<java.lang.Long,com.datastax.driver.core.Row>
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily.
At minimum, you need to set the KS and CF in your Hadoop job Configuration.
The ConfigHelper class is provided to make this
simple:
ConfigHelper.setInputColumnFamily
You can also configure the number of rows per InputSplit with
1: ConfigHelper.setInputSplitSize. The default split size is 64k rows.
or
2: ConfigHelper.setInputSplitSizeInMb. InputSplit size in MB with new, more precise method
If no value is provided for InputSplitSizeInMb, InputSplitSize will be used.
CQLConfigHelper.setInputCQLPageRowSize. The default page row size is 1000. You
should set it to "as big as possible, but no bigger." It set the LIMIT for the CQL
query, so you need set it big enough to minimize the network overhead, and also
not too big to avoid out of memory issue.
other native protocol connection parameters in CqlConfigHelper