JOIN_KEY
- type of join key. Must be a supported typeINPUT_RECORD
- type of input record. Must be a supported typeOUT
- type of output object@Beta public abstract class BatchJoiner<JOIN_KEY,INPUT_RECORD,OUT> extends MultiInputBatchConfigurable<BatchJoinerContext> implements Joiner<JOIN_KEY,INPUT_RECORD,OUT>, MultiInputPipelineConfigurable, StageLifecycle<BatchJoinerRuntimeContext>
Joiner
used for batch programs.
As it is used in batch programs, a BatchJoiner must be parameterized
with supported join key and input record classes. Join keys and input records can be a
byte[], Boolean, Integer, Long, Float, Double, String, or StructuredRecord.
If the join key is not one of those types and is being used in mapreduce,
it must implement Hadoop's org.apache.hadoop.io.WritableComparable interface.
If the input record is not one of those types and is being used in mapreduce,
it must implement Hadoop's org.apache.hadoop.io.Writable interface.
If the joiner is being used in spark, both the join key and input record must implement the
Serializable
interface.Modifier and Type | Field and Description |
---|---|
static String |
PLUGIN_TYPE |
Constructor and Description |
---|
BatchJoiner() |
Modifier and Type | Method and Description |
---|---|
void |
configurePipeline(MultiInputPipelineConfigurer multiInputPipelineConfigurer)
Configure the pipeline.
|
void |
destroy()
Destroy the Batch Joiner.
|
void |
initialize(BatchJoinerRuntimeContext context)
Initialize the Batch Joiner.
|
void |
prepareRun(BatchJoinerContext context)
Prepare a pipeline run.
|
onRunFinish
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getJoinConfig, getJoinKeys, joinOn, merge
public static final String PLUGIN_TYPE
public void configurePipeline(MultiInputPipelineConfigurer multiInputPipelineConfigurer)
configurePipeline
in interface MultiInputPipelineConfigurable
configurePipeline
in class MultiInputBatchConfigurable<BatchJoinerContext>
multiInputPipelineConfigurer
- the configurer used to add required datasets and streamspublic void prepareRun(BatchJoinerContext context) throws Exception
prepareRun
in interface SubmitterLifecycle<BatchJoinerContext>
prepareRun
in class MultiInputBatchConfigurable<BatchJoinerContext>
context
- batch execution contextException
public void initialize(BatchJoinerRuntimeContext context) throws Exception
Joiner.joinOn(String, Object)
and Joiner.merge(Object, Iterable)
are made.initialize
in interface StageLifecycle<BatchJoinerRuntimeContext>
context
- runtime context for joiner which exposes input schemas and output schema for joinerException
- if there is any error during initializationpublic void destroy()
destroy
in interface Destroyable
Copyright © 2021 Cask Data, Inc. Licensed under the Apache License, Version 2.0.