Google Dataflow Job
This page shows how to write Terraform for Dataflow Job and write them securely.
google_dataflow_job (Terraform)
The Job in Dataflow can be configured in Terraform with the resource name google_dataflow_job
. The following sections describe 3 examples of how to use the resource and its parameters.
Example Usage from GitHub
resource "google_dataflow_job" "main" {
for_each = { for v in local._job_conf : v.name => v }
name = each.value.name
template_gcs_path = each.value.template_gcs_path
temp_gcs_location = each.value.temp_gcs_location
resource "google_dataflow_job" "big_data_job" {
name = var.data_flow_name
template_gcs_path = var.template_gcs_path
temp_gcs_location = var.temp_gcs_location
project = var.project_name
network = var.network_name
resource "google_dataflow_job" "dataflow_job" {
region = var.region
zone = var.zone
name = var.name
on_delete = var.on_delete
max_workers = var.max_workers
Parameters
-
additional_experiments
optional - set of string
List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"].
-
enable_streaming_engine
optional - bool
Indicates if the job should use the streaming engine feature.
-
id
optional computed - string -
ip_configuration
optional - string
The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE".
-
job_id
optional computed - string
The unique ID of this job.
-
kms_key_name
optional - string
The name for the Cloud KMS key for the job. Key format is: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
-
labels
optional - map from string to string
User labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. NOTE: Google-provided Dataflow templates often provide default labels that begin with goog-dataflow-provided. Unless explicitly set in config, these labels will be ignored to prevent diffs on re-apply.
-
machine_type
optional - string
The machine type to use for the job.
-
max_workers
optional - number
The number of workers permitted to work on the job. More workers may improve processing speed at additional cost.
-
name
required - string
A unique name for the resource, required by Dataflow.
-
network
optional - string
The network to which VMs will be assigned. If it is not provided, "default" will be used.
-
on_delete
optional - string
One of "drain" or "cancel". Specifies behavior of deletion during terraform destroy.
-
parameters
optional - map from string to string
Key/Value pairs to be passed to the Dataflow job (as used in the template).
-
project
optional computed - string
The project in which the resource belongs.
-
region
optional - string
The region in which the created job should run.
-
service_account_email
optional - string
The Service Account email used to create the job.
-
state
optional computed - string
The current state of the resource, selected from the JobState enum.
-
subnetwork
optional - string
The subnetwork to which VMs will be assigned. Should be of the form "regions/REGION/subnetworks/SUBNETWORK".
-
temp_gcs_location
required - string
A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data.
-
template_gcs_path
required - string
The Google Cloud Storage path to the Dataflow job template.
-
transform_name_mapping
optional - map from string to string
Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job.
-
type
optional computed - string
The type of this job, selected from the JobType enum.
-
zone
optional - string
The zone in which the created job should run. If it is not provided, the provider zone is used.
Explanation in Terraform Registry
Creates a job on Dataflow, which is an implementation of Apache Beam running on Google Compute Engine. For more information see the official documentation for Beam and Dataflow.
Frequently asked questions
What is Google Dataflow Job?
Google Dataflow Job is a resource for Dataflow of Google Cloud Platform. Settings can be wrote in Terraform.
Where can I find the example code for the Google Dataflow Job?
For Terraform, the AtsushiKitano/assets, abhidatametica/ibc-ibx-pilot and marcelopicarelli/google-datalake source code examples are useful. See the Terraform Example section for further details.