AWS Glue Job
This page shows how to write Terraform and CloudFormation for AWS Glue Job and write them securely.
aws_glue_job (Terraform)
The Job in AWS Glue can be configured in Terraform with the resource name aws_glue_job
. The following sections describe 2 examples of how to use the resource and its parameters.
Example Usage from GitHub
resource "aws_glue_job" "bidb-cdc-data-load-glue-job" {
name = "bidb-cdc-data-load-glue-job"
role_arn = var.glue_s3_redshift_access_role
connections = [ "bidb-dev-datawarehouse-core-redshift-glue-connection" ]
default_arguments = {
"--Secret" = var.redshift_credentials_parameter_store
resource "aws_glue_job" "tf-gluejob-scheduled-1" {
name = "tf-gluejob-scheduled-1"
role_arn = "arn:aws:iam::152944667076:role/glue_helper"
command {
script_location = "s3://mvil-glue/MySQLBYOD.py"
Parameters
-
arn
optional computed - string -
connections
optional - list of string -
default_arguments
optional - map from string to string -
description
optional - string -
glue_version
optional computed - string -
id
optional computed - string -
max_capacity
optional computed - number -
max_retries
optional - number -
name
required - string -
non_overridable_arguments
optional - map from string to string -
number_of_workers
optional - number -
role_arn
required - string -
security_configuration
optional - string -
tags
optional - map from string to string -
timeout
optional - number -
worker_type
optional - string -
command
list block-
name
optional - string -
python_version
optional computed - string -
script_location
required - string
-
-
execution_property
list block-
max_concurrent_runs
optional - number
-
-
notification_property
list block-
notify_delay_after
optional - number
-
Explanation in Terraform Registry
Provides a Glue Job resource. -> Glue functionality, such as monitoring and logging of jobs, is typically managed with the
default_arguments
argument. See the Special Parameters Used by AWS Glue topic in the Glue developer guide for additional information.
AWS::Glue::Job (CloudFormation)
The Job in Glue can be configured in CloudFormation with the resource name AWS::Glue::Job
. The following sections describe 10 examples of how to use the resource and its parameters.
Example Usage from GitHub
Type: AWS::Glue::Job
Properties:
Description: This job creates both Hudi and Glueparquet tables
Command:
Name: glueetl
ScriptLocation: "s3://mqtran/artifacts/Glue/HudiJarGlueJob.py"
Type: "AWS::Glue::Job"
Properties:
Name: "tickit_public_category_refine"
Description: ""
Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AWSGlueServiceRole-demo"
ExecutionProperty:
Type: "AWS::Glue::Job"
Properties:
Name: !Sub "business_aggregate_daily-${AWS::Region}"
Role: !GetAtt IAMRole.Arn
ExecutionProperty:
MaxConcurrentRuns: 1
Type: "AWS::Glue::Job"
Properties:
Name: !Sub "business_aggregate_daily-${AWS::Region}"
Role: !GetAtt IAMRole.Arn
ExecutionProperty:
MaxConcurrentRuns: 1
Type: "AWS::Glue::Job"
Properties:
Command:
NotScriptLocation: ./packages/gluejob/dummy.py
Role: !GetAtt GlueRole.Arn
DummyGlueJob2:
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "pythonshell",
"PythonVersion": "3",
"ScriptLocation": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "pythonshell",
"PythonVersion": "3",
"ScriptLocation": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"ScriptLocation": {
"Fn::Sub": [
"Type": "AWS::Glue::Job",
"Properties": {
"Role": {
"Ref": "GlueIAMRole"
},
"DefaultArguments": {
"Type": "AWS::Glue::Job",
"Properties": {
"Role": {
"Fn::ImportValue": {
"Fn::Sub": "${InfrastructureID}-GlueRoleARN"
}
Parameters
-
Connections
optional - ConnectionsList -
MaxRetries
optional - Double -
Description
optional - String -
Timeout
optional - Integer -
AllocatedCapacity
optional - Double -
Name
optional - String -
Role
required - String -
DefaultArguments
optional - Json -
NotificationProperty
optional - NotificationProperty -
WorkerType
optional - String -
LogUri
optional - String -
Command
required - JobCommand -
GlueVersion
optional - String -
ExecutionProperty
optional - ExecutionProperty -
SecurityConfiguration
optional - String -
NumberOfWorkers
optional - Integer -
Tags
optional - Json -
MaxCapacity
optional - Double
Explanation in CloudFormation Registry
The
AWS::Glue::Job
resource specifies an AWS Glue job in the data catalog. For more information, see Adding Jobs in AWS Glue and Job Structure in the *AWS Glue Developer Guide.
Frequently asked questions
What is AWS Glue Job?
AWS Glue Job is a resource for Glue of Amazon Web Service. Settings can be wrote in Terraform and CloudFormation.
Where can I find the example code for the AWS Glue Job?
For Terraform, the devopsbynaresh/datalake-alsac and m-voels/tftest source code examples are useful. See the Terraform Example section for further details.
For CloudFormation, the mq-tran/hudi-glue, garystafford/tickit-data-lake-demo and aws-samples/aws-utility-meter-data-analytics-platform-cn source code examples are useful. See the CloudFormation Example section for further details.