Top / Google Cloud Platform / Google Dataflow / Job

Google Dataflow Job

This page shows how to write Terraform for Dataflow Job and write them securely.

Review your .tf file for Google best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

google_dataflow_job (Terraform)

The Job in Dataflow can be configured in Terraform with the resource name google_dataflow_job. The following sections describe 3 examples of how to use the resource and its parameters.

Example Usage from GitHub

AtsushiKitano/assets

job.tf#L7

resource "google_dataflow_job" "main" {
  for_each = { for v in local._job_conf : v.name => v }

  name                  = each.value.name
  template_gcs_path     = each.value.template_gcs_path
  temp_gcs_location     = each.value.temp_gcs_location

Find out how to use this setting securely with Shisho Cloud

abhidatametica/ibc-ibx-pilot

dataflow_setup.tf#L6

resource "google_dataflow_job" "big_data_job" {
  name              = var.data_flow_name
  template_gcs_path = var.template_gcs_path
  temp_gcs_location = var.temp_gcs_location
  project = var.project_name
  network = var.network_name

Find out how to use this setting securely with Shisho Cloud

marcelopicarelli/google-datalake

main.tf#L1

resource "google_dataflow_job" "dataflow_job" {
  region                = var.region
  zone                  = var.zone
  name                  = var.name
  on_delete             = var.on_delete
  max_workers           = var.max_workers

Find out how to use this setting securely with Shisho Cloud

Review your Terraform file for Google best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

Parameters

additional_experiments optional - set of string

List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"].

enable_streaming_engine optional - bool

Indicates if the job should use the streaming engine feature.

id optional computed - string
ip_configuration optional - string

The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE".

job_id optional computed - string

The unique ID of this job.

kms_key_name optional - string

The name for the Cloud KMS key for the job. Key format is: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

labels optional - map from string to string

User labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. NOTE: Google-provided Dataflow templates often provide default labels that begin with goog-dataflow-provided. Unless explicitly set in config, these labels will be ignored to prevent diffs on re-apply.

machine_type optional - string

The machine type to use for the job.

max_workers optional - number

The number of workers permitted to work on the job. More workers may improve processing speed at additional cost.

name required - string

A unique name for the resource, required by Dataflow.

network optional - string

The network to which VMs will be assigned. If it is not provided, "default" will be used.

on_delete optional - string

One of "drain" or "cancel". Specifies behavior of deletion during terraform destroy.

parameters optional - map from string to string

Key/Value pairs to be passed to the Dataflow job (as used in the template).

project optional computed - string

The project in which the resource belongs.

region optional - string

The region in which the created job should run.

service_account_email optional - string

The Service Account email used to create the job.

state optional computed - string

The current state of the resource, selected from the JobState enum.

subnetwork optional - string

The subnetwork to which VMs will be assigned. Should be of the form "regions/REGION/subnetworks/SUBNETWORK".

temp_gcs_location required - string

A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data.

template_gcs_path required - string

The Google Cloud Storage path to the Dataflow job template.

transform_name_mapping optional - map from string to string

Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job.

type optional computed - string

The type of this job, selected from the JobType enum.

zone optional - string

The zone in which the created job should run. If it is not provided, the provider zone is used.

timeouts single block
- update optional - string

>> from Terraform Registry

Explanation in Terraform Registry

Creates a job on Dataflow, which is an implementation of Apache Beam running on Google Compute Engine. For more information see the official documentation for Beam and Dataflow.

>> from Terraform Registry

The Other Related Google Dataflow Resources

Google Dataflow Flex Template Job

Frequently asked questions

What is Google Dataflow Job?

Google Dataflow Job is a resource for Dataflow of Google Cloud Platform. Settings can be wrote in Terraform.

Where can I find the example code for the Google Dataflow Job?

For Terraform, the AtsushiKitano/assets, abhidatametica/ibc-ibx-pilot and marcelopicarelli/google-datalake source code examples are useful. See the Terraform Example section for further details.

Automate config file reviews on your commits

Fix issues in your infrastructure as code with auto-generated patches.

google_dataflow_job
Frequently asked questions

Google Dataflow Job

Review your .tf file for Google best practices

google_dataflow_job (Terraform)

Example Usage from GitHub

Review your Terraform file for Google best practices

Parameters

Explanation in Terraform Registry

The Other Related Google Dataflow Resources

Frequently asked questions

What is Google Dataflow Job?

Where can I find the example code for the Google Dataflow Job?

Automate config file reviews on your commits

Table of Contents