|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.mapreduce.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat
public class ColumnFamilyOutputFormat
The ColumnFamilyOutputFormat
acts as a Hadoop-specific
OutputFormat that allows reduce tasks to store keys (and corresponding
values) as Cassandra rows (and respective columns) in a given
ColumnFamily.
As is the case with the ColumnFamilyInputFormat
, you need to set the
Keyspace and ColumnFamily in your
Hadoop job Configuration. The ConfigHelper
class, through its
ConfigHelper.setOutputColumnFamily(org.apache.hadoop.conf.Configuration, java.lang.String, java.lang.String)
method, is provided to make this
simple.
For the sake of performance, this class employs a lazy write-back caching mechanism, where its record writer batches mutations created based on the reduce's inputs (in a task-specific map), and periodically makes the changes official by sending a batch mutate request to Cassandra.
Nested Class Summary | |
---|---|
static class |
ColumnFamilyOutputFormat.NullOutputCommitter
An OutputCommitter that does nothing. |
Field Summary | |
---|---|
static java.lang.String |
BATCH_THRESHOLD
|
static java.lang.String |
QUEUE_SIZE
|
Constructor Summary | |
---|---|
ColumnFamilyOutputFormat()
|
Method Summary | |
---|---|
void |
checkOutputSpecs(org.apache.hadoop.fs.FileSystem filesystem,
org.apache.hadoop.mapred.JobConf job)
Deprecated. |
void |
checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context)
Check for validity of the output-specification for the job. |
static org.apache.cassandra.thrift.Cassandra.Client |
createAuthenticatedClient(org.apache.thrift.transport.TSocket socket,
org.apache.hadoop.conf.Configuration conf)
Return a client based on the given socket that points to the configured keyspace, and is logged in with the configured credentials. |
org.apache.hadoop.mapreduce.OutputCommitter |
getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
The OutputCommitter for this format does not write any data to the DFS. |
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter |
getRecordWriter(org.apache.hadoop.fs.FileSystem filesystem,
org.apache.hadoop.mapred.JobConf job,
java.lang.String name,
org.apache.hadoop.util.Progressable progress)
Deprecated. |
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get the RecordWriter for the given task. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String BATCH_THRESHOLD
public static final java.lang.String QUEUE_SIZE
Constructor Detail |
---|
public ColumnFamilyOutputFormat()
Method Detail |
---|
public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context)
checkOutputSpecs
in class org.apache.hadoop.mapreduce.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
context
- information about the job
java.io.IOException
- when output should not be attemptedpublic org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
getOutputCommitter
in class org.apache.hadoop.mapreduce.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
context
- the task context
java.io.IOException
java.lang.InterruptedException
@Deprecated public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job) throws java.io.IOException
checkOutputSpecs
in interface org.apache.hadoop.mapred.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
java.io.IOException
@Deprecated public org.apache.cassandra.hadoop.ColumnFamilyRecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem filesystem, org.apache.hadoop.mapred.JobConf job, java.lang.String name, org.apache.hadoop.util.Progressable progress) throws java.io.IOException
getRecordWriter
in interface org.apache.hadoop.mapred.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
java.io.IOException
public org.apache.cassandra.hadoop.ColumnFamilyRecordWriter getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
RecordWriter
for the given task.
getRecordWriter
in class org.apache.hadoop.mapreduce.OutputFormat<java.nio.ByteBuffer,java.util.List<org.apache.cassandra.thrift.Mutation>>
context
- the information about the current task.
RecordWriter
to write the output for the job.
java.io.IOException
java.lang.InterruptedException
public static org.apache.cassandra.thrift.Cassandra.Client createAuthenticatedClient(org.apache.thrift.transport.TSocket socket, org.apache.hadoop.conf.Configuration conf) throws org.apache.cassandra.thrift.InvalidRequestException, org.apache.thrift.TException, org.apache.cassandra.thrift.AuthenticationException, org.apache.cassandra.thrift.AuthorizationException
socket
- a socket pointing to a particular node, seed or otherwiseconf
- a job configuration
org.apache.cassandra.thrift.InvalidRequestException
org.apache.thrift.TException
org.apache.cassandra.thrift.AuthenticationException
org.apache.cassandra.thrift.AuthorizationException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |