AWS Glue Job
This page shows how to write Terraform and CloudFormation for AWS Glue Job and write them securely.
aws_glue_job (Terraform)
The Job in AWS Glue can be configured in Terraform with the resource name aws_glue_job. The following sections describe 2 examples of how to use the resource and its parameters.
Example Usage from GitHub
resource "aws_glue_job" "bidb-cdc-data-load-glue-job" {
name = "bidb-cdc-data-load-glue-job"
role_arn = var.glue_s3_redshift_access_role
connections = [ "bidb-dev-datawarehouse-core-redshift-glue-connection" ]
default_arguments = {
"--Secret" = var.redshift_credentials_parameter_store
resource "aws_glue_job" "tf-gluejob-scheduled-1" {
name = "tf-gluejob-scheduled-1"
role_arn = "arn:aws:iam::152944667076:role/glue_helper"
command {
script_location = "s3://mvil-glue/MySQLBYOD.py"
Parameters
-
arnoptional computed - string -
connectionsoptional - list of string -
default_argumentsoptional - map from string to string -
descriptionoptional - string -
glue_versionoptional computed - string -
idoptional computed - string -
max_capacityoptional computed - number -
max_retriesoptional - number -
namerequired - string -
non_overridable_argumentsoptional - map from string to string -
number_of_workersoptional - number -
role_arnrequired - string -
security_configurationoptional - string -
tagsoptional - map from string to string -
timeoutoptional - number -
worker_typeoptional - string -
commandlist block-
nameoptional - string -
python_versionoptional computed - string -
script_locationrequired - string
-
-
execution_propertylist block-
max_concurrent_runsoptional - number
-
-
notification_propertylist block-
notify_delay_afteroptional - number
-
Explanation in Terraform Registry
Provides a Glue Job resource. -> Glue functionality, such as monitoring and logging of jobs, is typically managed with the
default_argumentsargument. See the Special Parameters Used by AWS Glue topic in the Glue developer guide for additional information.
AWS::Glue::Job (CloudFormation)
The Job in Glue can be configured in CloudFormation with the resource name AWS::Glue::Job. The following sections describe 10 examples of how to use the resource and its parameters.
Example Usage from GitHub
Type: AWS::Glue::Job
Properties:
Description: This job creates both Hudi and Glueparquet tables
Command:
Name: glueetl
ScriptLocation: "s3://mqtran/artifacts/Glue/HudiJarGlueJob.py"
Type: "AWS::Glue::Job"
Properties:
Name: "tickit_public_category_refine"
Description: ""
Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AWSGlueServiceRole-demo"
ExecutionProperty:
Type: "AWS::Glue::Job"
Properties:
Name: !Sub "business_aggregate_daily-${AWS::Region}"
Role: !GetAtt IAMRole.Arn
ExecutionProperty:
MaxConcurrentRuns: 1
Type: "AWS::Glue::Job"
Properties:
Name: !Sub "business_aggregate_daily-${AWS::Region}"
Role: !GetAtt IAMRole.Arn
ExecutionProperty:
MaxConcurrentRuns: 1
Type: "AWS::Glue::Job"
Properties:
Command:
NotScriptLocation: ./packages/gluejob/dummy.py
Role: !GetAtt GlueRole.Arn
DummyGlueJob2:
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "pythonshell",
"PythonVersion": "3",
"ScriptLocation": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "pythonshell",
"PythonVersion": "3",
"ScriptLocation": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"ScriptLocation": {
"Fn::Sub": [
"Type": "AWS::Glue::Job",
"Properties": {
"Role": {
"Ref": "GlueIAMRole"
},
"DefaultArguments": {
"Type": "AWS::Glue::Job",
"Properties": {
"Role": {
"Fn::ImportValue": {
"Fn::Sub": "${InfrastructureID}-GlueRoleARN"
}
Parameters
-
Connectionsoptional - ConnectionsList -
MaxRetriesoptional - Double -
Descriptionoptional - String -
Timeoutoptional - Integer -
AllocatedCapacityoptional - Double -
Nameoptional - String -
Rolerequired - String -
DefaultArgumentsoptional - Json -
NotificationPropertyoptional - NotificationProperty -
WorkerTypeoptional - String -
LogUrioptional - String -
Commandrequired - JobCommand -
GlueVersionoptional - String -
ExecutionPropertyoptional - ExecutionProperty -
SecurityConfigurationoptional - String -
NumberOfWorkersoptional - Integer -
Tagsoptional - Json -
MaxCapacityoptional - Double
Explanation in CloudFormation Registry
The
AWS::Glue::Jobresource specifies an AWS Glue job in the data catalog. For more information, see Adding Jobs in AWS Glue and Job Structure in the *AWS Glue Developer Guide.
Frequently asked questions
What is AWS Glue Job?
AWS Glue Job is a resource for Glue of Amazon Web Service. Settings can be wrote in Terraform and CloudFormation.
Where can I find the example code for the AWS Glue Job?
For Terraform, the devopsbynaresh/datalake-alsac and m-voels/tftest source code examples are useful. See the Terraform Example section for further details.
For CloudFormation, the mq-tran/hudi-glue, garystafford/tickit-data-lake-demo and aws-samples/aws-utility-meter-data-analytics-platform-cn source code examples are useful. See the CloudFormation Example section for further details.