Google Dataflow Job
This page shows how to write Terraform for Dataflow Job and write them securely.
google_dataflow_job (Terraform)
The Job in Dataflow can be configured in Terraform with the resource name google_dataflow_job. The following sections describe 3 examples of how to use the resource and its parameters.
Example Usage from GitHub
resource "google_dataflow_job" "main" {
for_each = { for v in local._job_conf : v.name => v }
name = each.value.name
template_gcs_path = each.value.template_gcs_path
temp_gcs_location = each.value.temp_gcs_location
resource "google_dataflow_job" "big_data_job" {
name = var.data_flow_name
template_gcs_path = var.template_gcs_path
temp_gcs_location = var.temp_gcs_location
project = var.project_name
network = var.network_name
resource "google_dataflow_job" "dataflow_job" {
region = var.region
zone = var.zone
name = var.name
on_delete = var.on_delete
max_workers = var.max_workers
Parameters
-
additional_experimentsoptional - set of string
List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"].
-
enable_streaming_engineoptional - bool
Indicates if the job should use the streaming engine feature.
-
idoptional computed - string -
ip_configurationoptional - string
The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE".
-
job_idoptional computed - string
The unique ID of this job.
-
kms_key_nameoptional - string
The name for the Cloud KMS key for the job. Key format is: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
-
labelsoptional - map from string to string
User labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. NOTE: Google-provided Dataflow templates often provide default labels that begin with goog-dataflow-provided. Unless explicitly set in config, these labels will be ignored to prevent diffs on re-apply.
-
machine_typeoptional - string
The machine type to use for the job.
-
max_workersoptional - number
The number of workers permitted to work on the job. More workers may improve processing speed at additional cost.
-
namerequired - string
A unique name for the resource, required by Dataflow.
-
networkoptional - string
The network to which VMs will be assigned. If it is not provided, "default" will be used.
-
on_deleteoptional - string
One of "drain" or "cancel". Specifies behavior of deletion during terraform destroy.
-
parametersoptional - map from string to string
Key/Value pairs to be passed to the Dataflow job (as used in the template).
-
projectoptional computed - string
The project in which the resource belongs.
-
regionoptional - string
The region in which the created job should run.
-
service_account_emailoptional - string
The Service Account email used to create the job.
-
stateoptional computed - string
The current state of the resource, selected from the JobState enum.
-
subnetworkoptional - string
The subnetwork to which VMs will be assigned. Should be of the form "regions/REGION/subnetworks/SUBNETWORK".
-
temp_gcs_locationrequired - string
A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data.
-
template_gcs_pathrequired - string
The Google Cloud Storage path to the Dataflow job template.
-
transform_name_mappingoptional - map from string to string
Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job.
-
typeoptional computed - string
The type of this job, selected from the JobType enum.
-
zoneoptional - string
The zone in which the created job should run. If it is not provided, the provider zone is used.
Explanation in Terraform Registry
Creates a job on Dataflow, which is an implementation of Apache Beam running on Google Compute Engine. For more information see the official documentation for Beam and Dataflow.
Frequently asked questions
What is Google Dataflow Job?
Google Dataflow Job is a resource for Dataflow of Google Cloud Platform. Settings can be wrote in Terraform.
Where can I find the example code for the Google Dataflow Job?
For Terraform, the AtsushiKitano/assets, abhidatametica/ibc-ibx-pilot and marcelopicarelli/google-datalake source code examples are useful. See the Terraform Example section for further details.