com.krux.hyperion.objects

aws

package aws

Visibility
  1. Public
  2. All

Type Members

  1. trait AdpActivity extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    AWS Data Pipeline activity objects.

    AWS Data Pipeline activity objects.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-activities.html

  2. trait AdpDataFormat extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Defines AWS Data Pipeline Data Formats

    Defines AWS Data Pipeline Data Formats

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-dataformats.html

  3. trait AdpDataNode extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    AWS Data Pipeline DataNode objects

    AWS Data Pipeline DataNode objects

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-datanodes.html

  4. abstract class AdpDataPipelineAbstractObject extends AnyRef

  5. trait AdpDataPipelineDefaultObject extends AdpDataPipelineAbstractObject

    Each data pipeline can have a default object

  6. trait AdpDataPipelineObject extends AdpDataPipelineAbstractObject

    The base class of all AWS Data Pipeline objects.

    The base class of all AWS Data Pipeline objects.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-objects.html

  7. trait AdpDatabase extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    AWS Data Pipeline database objects.

    AWS Data Pipeline database objects.

    Ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-databases.html

  8. case class AdpEc2Resource(id: String, name: Option[String], terminateAfter: String, role: Option[String], resourceRole: Option[String], imageId: Option[String], instanceType: Option[String], region: Option[String], securityGroups: Option[Seq[String]], securityGroupIds: Option[Seq[String]], associatePublicIpAddress: Option[String], keyPair: Option[String]) extends AdpDataPipelineAbstractObject with AdpResource with Product with Serializable

    An EC2 instance that will perform the work defined by a pipeline activity.

    An EC2 instance that will perform the work defined by a pipeline activity.

    role

    The IAM role to use to create the EC2 instance.

    resourceRole

    The IAM role to use to control the resources that the EC2 instance can access.

    imageId

    The AMI version to use for the EC2 instances. For more information, see Amazon Machine Images (AMIs).

    instanceType

    The type of EC2 instance to use for the resource pool. The default value is m1.small. The values currently supported are: c1.medium, c1.xlarge, c3.2xlarge, c3.4xlarge, c3.8xlarge, c3.large, c3.xlarge, cc1.4xlarge, cc2.8xlarge, cg1.4xlarge, cr1.8xlarge, g2.2xlarge, hi1.4xlarge, hs1.8xlarge, i2.2xlarge, i2.4xlarge, i2.8xlarge, i2.xlarge, m1.large, m1.medium, m1.small, m1.xlarge, m2.2xlarge, m2.4xlarge, m2.xlarge, m3.2xlarge, m3.xlarge, t1.micro.

    region

    A region code to specify that the resource should run in a different region. For more information, see Using a Pipeline with Resources in Multiple Regions.

    securityGroups

    The names of one or more security groups to use for the instances in the resource pool. By default, Amazon EC2 uses the default security group.

    securityGroupIds

    The IDs of one or more security groups to use for the instances in the resource pool. By default, Amazon EC2 uses the default security group.

    associatePublicIpAddress

    Indicates whether to assign a public IP address to an instance. (An instance in a VPC can't access Amazon S3 unless it has a public IP address or a network address translation (NAT) instance with proper routing configuration.) If the instance is in EC2-Classic or a default VPC, the default value is true. Otherwise, the default value is false.

  9. case class AdpEmrActivity(id: String, name: Option[String], input: Option[AdpRef[AdpDataNode]], output: Option[AdpRef[AdpDataNode]], preStepCommand: Option[Seq[String]], postStepCommand: Option[Seq[String]], runsOn: AdpRef[AdpEmrCluster], step: Seq[String], dependsOn: Option[Seq[AdpRef[AdpActivity]]]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Runs an Amazon EMR cluster.

    Runs an Amazon EMR cluster.

    AWS Data Pipeline uses a different format for steps than Amazon EMR, for example AWS Data Pipeline uses comma-separated arguments after the JAR name in the EmrActivity step field.

    input

    The input data source.

    output

    The location for the output

    preStepCommand

    Shell scripts to be run before any steps are run. To specify multiple scripts, up to 255, add multiple preStepCommand fields.

    postStepCommand

    Shell scripts to be run after all steps are finished. To specify multiple scripts, up to 255, add multiple postStepCommand fields.

    runsOn

    The Amazon EMR cluster to run this cluster.

    step

    One or more steps for the cluster to run. To specify multiple steps, up to 255, add multiple step fields. Use comma-separated arguments after the JAR name; for example, "s3://example-bucket/MyWork.jar,arg1,arg2,arg3".

  10. case class AdpEmrCluster(id: String, name: Option[String], bootstrapAction: Seq[String], amiVersion: Option[String], masterInstanceType: Option[String], coreInstanceType: Option[String], coreInstanceCount: Option[String], taskInstanceType: Option[String], taskInstanceCount: Option[String], terminateAfter: String, keyPair: Option[String]) extends AdpDataPipelineAbstractObject with AdpResource with Product with Serializable

    Represents the configuration of an Amazon EMR cluster.

    Represents the configuration of an Amazon EMR cluster. This object is used by EmrActivity to launch a cluster.

  11. case class AdpRedshiftCopyActivity(id: String, name: Option[String], input: AdpRef[AdpDataNode], insertMode: String, output: AdpRef[AdpDataNode], runsOn: AdpRef[AdpEc2Resource], transformSql: Option[String], commandOptions: Option[Seq[String]], queue: Option[String], dependsOn: Option[Seq[AdpRef[AdpActivity]]]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html

    id

    required for AdpDataPipelineObject

    name

    required for AdpDataPipelineObject

    input

    The input data node. The data source can be Amazon S3, DynamoDB, or Amazon Redshift.

    insertMode

    Determines what AWS Data Pipeline does with pre-existing data in the target table that overlaps with rows in the data to be loaded. Valid values are KEEP_EXISTING, OVERWRITE_EXISTING, and TRUNCATE.

    output

    The output data node. The output location can be Amazon S3 or Amazon Redshift.

    runsOn

    Required for AdpActivity

    transformSql

    The SQL SELECT expression used to transform the input data.

    commandOptions

    Takes COPY parameters to pass to the Amazon Redshift data node.

    queue

    Corresponds to the query_group setting in Amazon Redshift, which allows you to assign and prioritize concurrent activities based on their placement in queues. Amazon Redshift limits the number of simultaneous connections to 15.

    dependsOn

    Required for AdpActivity

  12. case class AdpRedshiftDataNode(id: String, name: Option[String], createTableSql: Option[String], database: AdpRef[AdpRedshiftDatabase], schemaName: Option[String], tableName: String, primaryKeys: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDataNode with Product with Serializable

    Defines a data node using Amazon Redshift.

    Defines a data node using Amazon Redshift.

    primaryKeys

    If you do not specify primaryKeys for a destination table in RedShiftCopyActivity, you can specify a list of columns using primaryKeys which will act as a mergeKey. However, if you have an existing primaryKey defined in a Redshift table, this setting overrides the existing key.

  13. case class AdpRedshiftDatabase(id: String, name: Option[String], clusterId: String, connectionString: Option[String], databaseName: Option[String], jdbcProperties: Option[String], *password: String, username: String) extends AdpDataPipelineAbstractObject with AdpDatabase with Product with Serializable

    Defines an Amazon Redshift database.

    Defines an Amazon Redshift database.

    clusterId

    The identifier provided by the user when the Amazon Redshift cluster was created. For example, if the endpoint for your Amazon Redshift cluster is mydb.example.us-east-1.redshift.amazonaws.com, the correct clusterId value is mydb. In the Amazon Redshift console, this value is "Cluster Name".

    connectionString

    The JDBC endpoint for connecting to an Amazon Redshift instance owned by an account different than the pipeline.

  14. case class AdpRef[+T <: AdpDataPipelineObject](objId: String) extends Product with Serializable

    References to an existing aws data pipeline object

    References to an existing aws data pipeline object

    more details: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-expressions.html

  15. trait AdpResource extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Defines the AWS Data Pipeline Resources

    Defines the AWS Data Pipeline Resources

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-resources.html

  16. trait AdpS3DataNode extends AdpDataPipelineAbstractObject with AdpDataNode

    Defines a data node using Amazon S3.

  17. case class AdpS3DirectoryDataNode(id: String, name: Option[String], compression: Option[String], dataFormat: Option[AdpRef[AdpDataFormat]], directoryPath: String, manifestFilePath: Option[String]) extends AdpDataPipelineAbstractObject with AdpS3DataNode with Product with Serializable

    You must provide either a filePath or directoryPath value.

  18. case class AdpS3FileDataNode(id: String, name: Option[String], compression: Option[String], dataFormat: Option[AdpRef[AdpDataFormat]], filePath: String, manifestFilePath: Option[String]) extends AdpDataPipelineAbstractObject with AdpS3DataNode with Product with Serializable

    You must provide either a filePath or directoryPath value.

  19. trait AdpSchedule extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Defines the timing of a scheduled event, such as when an activity runs.

    Defines the timing of a scheduled event, such as when an activity runs.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html

  20. case class AdpShellCommandActivity(id: String, name: Option[String], command: Option[String], scriptUri: Option[String], scriptArgument: Option[Seq[String]], input: Option[AdpRef[AdpDataNode]], output: Option[AdpRef[AdpDataNode]], stage: String, stdout: Option[String], stderr: Option[String], dependsOn: Option[Seq[AdpRef[AdpActivity]]], runsOn: AdpRef[AdpEc2Resource]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Runs a command on an EC2 node.

    Runs a command on an EC2 node. You specify the input S3 location, output S3 location and the script/command.

    command

    The command to run. This value and any associated parameters must function in the environment from which you are running the Task Runner.

    scriptUri

    An Amazon S3 URI path for a file to download and run as a shell command. Only one scriptUri or command field should be present. scriptUri cannot use parameters, use command instead.

    scriptArgument

    A list of arguments to pass to the shell script.

    input

    The input data source.

    output

    The location for the output.

    stage

    Determines whether staging is enabled and allows your shell commands to have access to the staged-data variables, such as

    $\{INPUT1_STAGING_DIR\
    stdout

    The Amazon S3 path that receives redirected output from the command. If you use the runsOn field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However if you specify the workerGroup field, a local file path is permitted.

    stderr

    The path that receives redirected system error messages from the command. If you use the runsOn field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However if you specify the workerGroup field, a local file path is permitted.

  21. case class AdpSqlActivity(id: String, name: Option[String], database: AdpRef[AdpDatabase], script: String, scriptArgument: Option[Seq[String]], queue: Option[String], dependsOn: Option[Seq[AdpRef[AdpActivity]]], runsOn: AdpRef[AdpEc2Resource]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Runs a SQL query on a database.

    Runs a SQL query on a database. You specify the input table where the SQL query is run and the output table where the results are stored. If the output table doesn't exist, this operation creates a new table with that name.

    script

    The SQL script to run. For example:

    insert into output select * from input where lastModified in range (?, ?)

    the script is not evaluated as an expression. In that situation, scriptArgument are useful

    Note

    that scriptUri is deliberately missing from this implementation, as there does not seem to be any use case for now.

  22. case class AdpStartAtSchedule(id: String, name: Option[String], period: String, startAt: String, occurrences: Option[String]) extends AdpDataPipelineAbstractObject with AdpSchedule with Product with Serializable

    startAt

    The date and time at which to start the scheduled pipeline runs. Valid value is FIRST_ACTIVATION_DATE_TIME. FIRST_ACTIVATION_DATE_TIME is assumed to be the current date and time.

  23. case class AdpStartDateTimeSchedule(id: String, name: Option[String], period: String, startDateTime: github.nscala_time.time.Imports.DateTime, occurrences: Option[String]) extends AdpDataPipelineAbstractObject with AdpSchedule with Product with Serializable

    startDateTime

    The date and time to start the scheduled runs. You must use either startDateTime or startAt but not both.

  24. case class AdpTsvDataFormat(id: String, name: Option[String], column: Option[Seq[String]], escapeChar: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    A comma-delimited data format where the column separator is a tab character and the record separator is a newline character.

    A comma-delimited data format where the column separator is a tab character and the record separator is a newline character.

    column

    The structure of the data file. Use column names and data types separated by a space. For example:

    [ "Name STRING", "Score INT", "DateOfBirth TIMESTAMP" ]

    You can omit the data type when using STRING, which is the default. Valid data types: TINYINT, SMALLINT, INT, BIGINT, BOOLEAN, FLOAT, DOUBLE, STRING, TIMESTAMP

    escapeChar

    A character, for example "\", that instructs the parser to ignore the next character.

Value Members

  1. object AdpJsonSerializer

    Serializes a aws data pipeline object to JSON

  2. object AdpPipelineSerializer

  3. object AdpRef extends Serializable

Ungrouped