org.apache.cassandra.hadoop
Class ColumnFamilyInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
org.apache.cassandra.hadoop.ColumnFamilyInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
public class ColumnFamilyInputFormat
- extends org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
- implements org.apache.hadoop.mapred.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily.
At minimum, you need to set the CF and predicate (description of columns to extract from each row)
in your Hadoop job Configuration. The ConfigHelper class is provided to make this
simple:
ConfigHelper.setColumnFamily
ConfigHelper.setSlicePredicate
You can also configure the number of rows per InputSplit with
ConfigHelper.setInputSplitSize
This should be "as big as possible, but no bigger." Each InputSplit is read from Cassandra
with multiple get_slice_range queries, and the per-call overhead of get_slice_range is high,
so larger split sizes are better -- but if it is too large, you will run out of memory.
The default split size is 64k rows.
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
|
org.apache.hadoop.mapred.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf jobConf,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf jobConf,
int numSplits)
|
java.util.List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MAPRED_TASK_ID
public static final java.lang.String MAPRED_TASK_ID
- See Also:
- Constant Field Values
CASSANDRA_HADOOP_MAX_KEY_SIZE
public static final java.lang.String CASSANDRA_HADOOP_MAX_KEY_SIZE
- See Also:
- Constant Field Values
CASSANDRA_HADOOP_MAX_KEY_SIZE_DEFAULT
public static final int CASSANDRA_HADOOP_MAX_KEY_SIZE_DEFAULT
- See Also:
- Constant Field Values
ColumnFamilyInputFormat
public ColumnFamilyInputFormat()
getSplits
public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
throws java.io.IOException
- Specified by:
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
- Throws:
java.io.IOException
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
throws java.io.IOException,
java.lang.InterruptedException
- Specified by:
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
- Throws:
java.io.IOException
java.lang.InterruptedException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf jobConf,
int numSplits)
throws java.io.IOException
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
- Throws:
java.io.IOException
getRecordReader
public org.apache.hadoop.mapred.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf jobConf,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
- Throws:
java.io.IOException
Copyright © 2012 The Apache Software Foundation