- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<java.lang.Long,com.datastax.driver.core.Row>
public class CqlInputFormat
extends AbstractColumnFamilyInputFormat<java.lang.Long,com.datastax.driver.core.Row>
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily.
At minimum, you need to set the KS and CF in your Hadoop job Configuration.
The ConfigHelper class is provided to make this
simple:
ConfigHelper.setInputColumnFamily
You can also configure the number of rows per InputSplit with
ConfigHelper.setInputSplitSize. The default split size is 64k rows.
the number of CQL rows per page
CQLConfigHelper.setInputCQLPageRowSize. The default page row size is 1000. You
should set it to "as big as possible, but no bigger." It set the LIMIT for the CQL
query, so you need set it big enough to minimize the network overhead, and also
not too big to avoid out of memory issue.
other native protocol connection parameters in CqlConfigHelper