Google Dataflow Job

This page shows how to write Terraform for Dataflow Job and write them securely.

code-icon

Fix issues in your cloud & app configurations

Test for misconfigurations of this resource in your cloud.

get-started-button

Terraform Example (google_dataflow_job)

Creates a job on Dataflow, which is an implementation of Apache Beam running on Google Compute Engine. For more information see the official documentation for Beam and Dataflow.

Parameters

  • additional_experiments optional - set / string
    • List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"].

  • enable_streaming_engine optional - bool
    • Indicates if the job should use the streaming engine feature.

  • id optionalcomputed - string
  • ip_configuration optional - string
    • The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE".

  • job_id requiredcomputed - string
    • The unique ID of this job.

  • kms_key_name optional - string
    • The name for the Cloud KMS key for the job. Key format is: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

  • labels optional - map / string
    • User labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. NOTE: Google-provided Dataflow templates often provide default labels that begin with goog-dataflow-provided. Unless explicitly set in config, these labels will be ignored to prevent diffs on re-apply.

  • machine_type optional - string
    • The machine type to use for the job.

  • max_workers optional - number
    • The number of workers permitted to work on the job. More workers may improve processing speed at additional cost.

  • name required - string
    • A unique name for the resource, required by Dataflow.

  • network optional - string
    • The network to which VMs will be assigned. If it is not provided, "default" will be used.

  • on_delete optional - string
    • One of "drain" or "cancel". Specifies behavior of deletion during terraform destroy.

  • parameters optional - map / string
    • Key/Value pairs to be passed to the Dataflow job (as used in the template).

  • project optionalcomputed - string
    • The project in which the resource belongs.

  • region optional - string
    • The region in which the created job should run.

  • service_account_email optional - string
    • The Service Account email used to create the job.

  • state requiredcomputed - string
    • The current state of the resource, selected from the JobState enum.

  • subnetwork optional - string
    • The subnetwork to which VMs will be assigned. Should be of the form "regions/REGION/subnetworks/SUBNETWORK".

  • temp_gcs_location required - string
    • A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data.

  • template_gcs_path required - string
    • The Google Cloud Storage path to the Dataflow job template.

  • transform_name_mapping optional - map / string
    • Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job.

  • type requiredcomputed - string
    • The type of this job, selected from the JobType enum.

  • zone optional - string
    • The zone in which the created job should run. If it is not provided, the provider zone is used.

Example Usage (from GitHub)

github-iconAtsushiKitano/assets
resource "google_dataflow_job" "main" {
  for_each = { for v in local._job_conf : v.name => v }

  name                  = each.value.name
  template_gcs_path     = each.value.template_gcs_path
  temp_gcs_location     = each.value.temp_gcs_location
github-iconabhidatametica/ibc-ibx-pilot
resource "google_dataflow_job" "big_data_job" {
  name              = var.data_flow_name
  template_gcs_path = var.template_gcs_path
  temp_gcs_location = var.temp_gcs_location
  project = var.project_name
  network = var.network_name
github-iconmarcelopicarelli/google-datalake
resource "google_dataflow_job" "dataflow_job" {
  region                = var.region
  zone                  = var.zone
  name                  = var.name
  on_delete             = var.on_delete
  max_workers           = var.max_workers

Frequently asked questions

What is Google Dataflow Job?

Google Dataflow Job is a resource for Dataflow of Google Cloud Platform. Settings can be wrote in Terraform.

Where can I find the example code for the Google Dataflow Job?

For Terraform, the AtsushiKitano/assets, abhidatametica/ibc-ibx-pilot and marcelopicarelli/google-datalake source code examples are useful. See the Terraform Example section for further details.