AWS Data Pipeline Pipeline

This page shows how to write Terraform and CloudFormation for AWS Data Pipeline Pipeline and write them securely.

aws_datapipeline_pipeline (Terraform)

The Pipeline in AWS Data Pipeline can be configured in Terraform with the resource name aws_datapipeline_pipeline. The following sections describe 4 examples of how to use the resource and its parameters.

Example Usage from GitHub

sample-data-pipeline.tf#L1
resource "aws_datapipeline_pipeline" "default" {
  name = "tf-pipeline-default"
}
main.tf#L17
resource "aws_datapipeline_pipeline" "default" {
  instance_type    = var.instance_type
  key_name         = var.key_name
  ami              = data.aws_ami.default.id
  filename = "../build/libs/demo-1.0.0.jar"
  function_name = "demo"
main.tf#L1
resource "aws_datapipeline_pipeline" "default" {
  name = var.name
}

main.tf#L7
resource "aws_datapipeline_pipeline" "this" {
  description = var.description
  name        = var.name
  tags        = var.tags
}

Review your Terraform file for AWS best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

Parameters

  • description optional - string
  • id optional computed - string
  • name required - string
  • tags optional - map from string to string

Explanation in Terraform Registry

Provides a Data Pipeline resource.

AWS::DataPipeline::Pipeline (CloudFormation)

The Pipeline in DataPipeline can be configured in CloudFormation with the resource name AWS::DataPipeline::Pipeline. The following sections describe 10 examples of how to use the resource and its parameters.

Example Usage from GitHub

datapipeline.yml#L63
    Type: AWS::DataPipeline::Pipeline
    Properties:
      Activate: Boolean
      Description: "S3BucketSync - Sync S3 Buckets between accounts"
      Name: "S3BucketSync"
      ParameterObjects:
datapipeline-template.yml#L35
    Type: AWS::DataPipeline::Pipeline
    Properties:
      Name: "flow-tooling"
      Description: "Pipeline to flow tooling"
      Activate: !Ref ActivateDataPipeline
      ParameterObjects:
pipeline.yml#L34
    Type: AWS::DataPipeline::Pipeline
    Properties:
      Name: Ec2ExecuteCmds
      Description: Execute commands on EC2
      Activate: true
      ParameterObjects:
account.yml#L29
    Type: "AWS::DataPipeline::Pipeline"
    Properties:
      # Name will be EDWTransformLoadPipelineDev/qe/production
      Name: !Join ["", ["BulkLoadDataPipeline", !Ref Environment]]
      Description: Pipeline to orchestrate the transformation
      Activate: false
data-pipeline-sample.yml#L1
iType: AWS::DataPipeline::Pipeline
Properties:
  Activate: Boolean
  Description: String
  Name: String
  ParameterObjects:
DataPipeline-multiple-StringValue.json#L5
      "Type": "AWS::DataPipeline::Pipeline",
      "Properties": {
        "Name": "DynamoDBInputS3OutputHive",
        "Description": "Pipeline to backup DynamoDB data to S3",
        "Activate": "true",
        "PipelineObjects": [
DataPipeline-multiple-StringValue.json#L5
      "Type": "AWS::DataPipeline::Pipeline",
      "Properties": {
        "Name": "DynamoDBInputS3OutputHive",
        "Description": "Pipeline to backup DynamoDB data to S3",
        "Activate": "true",
        "PipelineObjects": [
DataPipeline-multiple-StringValue.json#L5
      "Type": "AWS::DataPipeline::Pipeline",
      "Properties": {
        "Name": "DynamoDBInputS3OutputHive",
        "Description": "Pipeline to backup DynamoDB data to S3",
        "Activate": "true",
        "PipelineObjects": [
DataPipeline.json#L3
  "resourceType" : "AWS::DataPipeline::Pipeline",
  "properties" : [ {
    "propertyName" : "Activate",
    "propertyType" : "Boolean",
    "required" : false
  }, {
scoring_api_data_pipeline.json#L2
  "Type" : "AWS::DataPipeline::Pipeline",
  "Properties" : {
    "Activate" : true,
    "Description" : "Test pipeline for scoring api.",
    "Name" : "scoring-pipeline-mau-summary",
    "ParameterObjects" : [ Parameter object, ... ],

Parameters

Explanation in CloudFormation Registry

The AWS::DataPipeline::Pipeline resource specifies a data pipeline that you can use to automate the movement and transformation of data. In each pipeline, you define pipeline objects, such as activities, schedules, data nodes, and resources. For information about pipeline objects and components that you can use, see Pipeline Object Reference in the AWS Data Pipeline Developer Guide.

The AWS::DataPipeline::Pipeline resource adds tasks, schedules, and preconditions to the specified pipeline. You can use PutPipelineDefinition to populate a new pipeline. PutPipelineDefinition also validates the configuration as it adds it to the pipeline. Changes to the pipeline are saved unless one of the following validation errors exist in the pipeline.

  • An object is missing a name or identifier field.
    • A string or reference field is empty.

    • The number of objects in the pipeline exceeds the allowed maximum number of objects.

    • The pipeline is in a FINISHED state. Pipeline object definitions are passed to the PutPipelineDefinition action and returned by the GetPipelineDefinition action.

Frequently asked questions

What is AWS Data Pipeline Pipeline?

AWS Data Pipeline Pipeline is a resource for Data Pipeline of Amazon Web Service. Settings can be wrote in Terraform and CloudFormation.

Where can I find the example code for the AWS Data Pipeline Pipeline?

For Terraform, the ShabariRepo/iac-automation, ys588281/terraform and gauravgitdir/Jack source code examples are useful. See the Terraform Example section for further details.

For CloudFormation, the igorlg/aws-datapipeline-sync-s3, flow-lab/aws-cloudformation and jniedrauer/cloudformation source code examples are useful. See the CloudFormation Example section for further details.