The activity that copys data from one data node to the other.
CSV data format
Custom data format
Activity to recursively delete files in an S3 path.
A precondition to check that data exists in a DynamoDB table.
DynamoDB data format
DynamoDB Export data format
A precondition to check that the DynamoDB table exists.
EC2 resource
Checks whether a data node object exists.
Google Storage Download activity
Google Storage Upload activity
Shell command activity
Defines a MapReduce activity
Launch a MapReduce cluster
A MapReduce step that runs on MapReduce Cluster
The base trait of krux data pipeline objects.
A condition that must be met before the object can run.
A condition that must be met before the object can run. The activity cannot run until all its conditions are met.
Redshift copy activity
The abstracted RedshiftDataNode
Redshift Database Trait, to use this please extend with an object.
Redshift unload activity
RegEx data format
Run time references of runnable objects
Defines data from s3
Defines data from s3 directory
Checks whether a key exists in an Amazon S3 data node.
A precondition to check that the Amazon S3 objects with the given prefix (represented as a URI) are present.
Cron liked schedule that runs at defined period.
Cron liked schedule that runs at defined period.
If start time given is a past time, data pipeline will perform back fill from the start.
Shell command activity
A Unix/Linux shell command that can be run as a precondition.
Defines a spark activity
Launch a Spark cluster
A spark step that runs on Spark Cluster
that the AWS Datapipeline SqlDataNode does not require a JdbcDatabase parameter, but requires specify the username, password, etc. within the object, we require a JdbcDatabase object for consistency with other database data node objects.
TSV data format
The activity that copys data from one data node to the other.
it seems that both input and output format needs to be in CsvDataFormat for this copy to work properly and it needs to be a specific variance of the CSV, for more information check the web page:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html
From our experience it's really hard to export using TsvDataFormat, in both import and export especially for tasks involving RedshiftCopyActivity. A general rule of thumb is always use default CsvDataFormat for tasks involving both exporting to S3 and copy to redshift.