Azure Synapse Spark Pool

This page shows how to write Terraform and Azure Resource Manager for Synapse Spark Pool and write them securely.

azurerm_synapse_spark_pool (Terraform)

The Spark Pool in Synapse can be configured in Terraform with the resource name azurerm_synapse_spark_pool. The following sections describe 10 examples of how to use the resource and its parameters.

Example Usage from GitHub

synapse_spark_pool.tf#L1
resource "azurerm_synapse_spark_pool" "synapseSparkPool001" {
  name                 = "SparkPool001"
  synapse_workspace_id = azurerm_synapse_workspace.synapseProduct001.id
  node_size_family     = "MemoryOptimized"
  node_size            = "Small"

database_pools.tf#L17
resource "azurerm_synapse_spark_pool" "this" {
  for_each             = local.spark
  name                 = each.value.name
  synapse_workspace_id = azurerm_synapse_workspace.ws.id

  node_size_family     = var.node_size_family
synapse_spark_pool_test.tf#L46
resource "azurerm_synapse_spark_pool" "default" {
  name                 = "example"
  synapse_workspace_id = azurerm_synapse_workspace.example.id
  node_size_family     = "MemoryOptimized"
  node_size            = "Small"

spark_pool.tf#L11
resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = azurecaf_name.sparkpool.result
  synapse_workspace_id = var.synapse_workspace_id
  node_size_family     = var.settings.node_size_family
  node_size            = var.settings.node_size

spark_pool.tf#L11
resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = azurecaf_name.sparkpool.result
  synapse_workspace_id = var.synapse_workspace_id
  node_size_family     = var.settings.node_size_family
  node_size            = var.settings.node_size

ApacheSparkPool.tf#L3
resource "azurerm_synapse_spark_pool" "spark_pool" {
    depends_on = [
      azurerm_synapse_workspace.synapse_workspace
    ]
  name                 = var.spark_pool_name
  synapse_workspace_id = azurerm_synapse_workspace.synapse_workspace.id
main.tf#L42
resource "azurerm_synapse_spark_pool" "coresynsqlsparkpools" {
  for_each             = var.coreSynSqlSparkPools
  name                 = each.value["sprkPoolName"]
  synapse_workspace_id = each.value["sprkSynWrkSpcId"]
  node_size_family     = each.value["sprkPoolNodeSizeFamily"]
  node_size            = each.value["sprkPoolNodeSize"]
spark_pool.tf#L14
resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = azurecaf_name.sparkpool.result
  synapse_workspace_id = var.synapse_workspace_id
  node_size_family     = var.settings.node_size_family
  node_size            = var.settings.node_size
  node_count           = try(var.settings.node_count, null)
spark_pool.tf#L14
resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = azurecaf_name.sparkpool.result
  synapse_workspace_id = var.synapse_workspace_id
  node_size_family     = var.settings.node_size_family
  node_size            = var.settings.node_size
  node_count           = try(var.settings.node_count, null)
spark_pool.tf#L14
resource "azurerm_synapse_spark_pool" "spark_pool" {
  name                 = azurecaf_name.sparkpool.result
  synapse_workspace_id = var.synapse_workspace_id
  node_size_family     = var.settings.node_size_family
  node_size            = var.settings.node_size
  node_count           = try(var.settings.node_count, null)

Review your Terraform file for Azure best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

Parameters

Explanation in Terraform Registry

Manages a Synapse Spark Pool.

Tips: Best Practices for The Other Azure Synapse Resources

In addition to the azurerm_synapse_workspace, Azure Synapse has the other resources that should be configured for security reasons. Please check some examples of those resources and precautions.

risk-label

azurerm_synapse_workspace

Ensure to enable the managed virtual network

It is better to enable the managed virtual network, which is disabled as the default.

Review your Azure Synapse settings

In addition to the above, there are other security points you should be aware of making sure that your .tf files are protected in Shisho Cloud.

Microsoft.Synapse/workspaces/bigDataPools (Azure Resource Manager)

The workspaces/bigDataPools in Microsoft.Synapse can be configured in Azure Resource Manager with the resource name Microsoft.Synapse/workspaces/bigDataPools. The following sections describe how to use the resource and its parameters.

Example Usage from GitHub

ARMTemplateForWorkspace.json#L177
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "apiVersion": "2019-06-01-preview",
            "properties": {},
            "dependsOn": []
        },
        {
ARMTemplateForWorkspace.json#L177
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "apiVersion": "2019-06-01-preview",
            "properties": {
                "autoPause": {
                    "enabled": true,
                    "delayInMinutes": 15
ListBigDataPoolsInWorkspace.json#L14
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "location": "West US 2",
            "name": "ExamplePool",
            "tags": {},
            "properties": {
              "provisioningState": "Succeeded",
DeleteBigDataPool.json#L14
        "type": "Microsoft.Synapse/workspaces/bigDataPools",
        "location": "West US 2",
        "name": "ExamplePool",
        "tags": {},
        "properties": {
          "provisioningState": "Deleting",
ListBigDataPoolsInWorkspace.json#L14
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "location": "West US 2",
            "name": "ExamplePool",
            "tags": {},
            "properties": {
              "provisioningState": "Succeeded",
ListBigDataPoolsInWorkspace.json#L14
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "location": "West US 2",
            "name": "ExamplePool",
            "tags": {},
            "properties": {
              "provisioningState": "Succeeded",
ListBigDataPoolsInWorkspace.json#L14
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "location": "West US 2",
            "name": "ExamplePool",
            "tags": {},
            "properties": {
              "provisioningState": "Succeeded",
ListBigDataPoolsInWorkspace.json#L14
            "type": "Microsoft.Synapse/workspaces/bigDataPools",
            "location": "West US 2",
            "name": "ExamplePool",
            "tags": {},
            "properties": {
              "provisioningState": "Succeeded",
DeleteBigDataPool.json#L14
        "type": "Microsoft.Synapse/workspaces/bigDataPools",
        "location": "West US 2",
        "name": "ExamplePool",
        "tags": {},
        "properties": {
          "provisioningState": "Deleting",
DeleteBigDataPool.json#L14
        "type": "Microsoft.Synapse/workspaces/bigDataPools",
        "location": "West US 2",
        "name": "ExamplePool",
        "tags": {},
        "properties": {
          "provisioningState": "Deleting",

Parameters

  • apiVersion required - string
  • location required - string

    The geo-location where the resource lives

  • name required - string

    Big Data pool name

  • properties required
      • autoPause optional
          • delayInMinutes optional - integer

            Number of minutes of idle time before the Big Data pool is automatically paused.

          • enabled optional - boolean

            Whether auto-pausing is enabled for the Big Data pool.

      • autoScale optional
          • enabled optional - boolean

            Whether automatic scaling is enabled for the Big Data pool.

          • maxNodeCount optional - integer

            The maximum number of nodes the Big Data pool can support.

          • minNodeCount optional - integer

            The minimum number of nodes the Big Data pool can support.

      • cacheSize optional - integer

        The cache size

      • creationDate optional - string

        The time when the Big Data pool was created.

      • customLibraries optional array
          • containerName optional - string

            Storage blob container name.

          • name optional - string

            Name of the library.

          • path optional - string

            Storage blob path of library.

          • type optional - string

            Type of the library.

      • defaultSparkLogFolder optional - string

        The default folder where Spark logs will be written.

      • dynamicExecutorAllocation optional
          • enabled optional - boolean

            Indicates whether Dynamic Executor Allocation is enabled or not.

      • isComputeIsolationEnabled optional - boolean

        Whether compute isolation is required or not.

      • libraryRequirements optional
          • content optional - string

            The library requirements.

          • filename optional - string

            The filename of the library requirements file.

      • nodeCount optional - integer

        The number of nodes in the Big Data pool.

      • nodeSize optional - string

        The level of compute power that each node in the Big Data pool has.

      • nodeSizeFamily optional - string

        The kind of nodes that the Big Data pool provides.

      • provisioningState optional - string

        The state of the Big Data pool.

      • sessionLevelPackagesEnabled optional - boolean

        Whether session level packages enabled.

      • sparkConfigProperties optional
          • content optional - string

            The library requirements.

          • filename optional - string

            The filename of the library requirements file.

      • sparkEventsFolder optional - string

        The Spark events folder

      • sparkVersion optional - string

        The Apache Spark version.

  • tags optional - string

    Resource tags.

  • type required - string

Frequently asked questions

What is Azure Synapse Spark Pool?

Azure Synapse Spark Pool is a resource for Synapse of Microsoft Azure. Settings can be wrote in Terraform.

Where can I find the example code for the Azure Synapse Spark Pool?

For Terraform, the tschwarz01/tf-caf-data-landing-zone, PacktPublishing/Azure-Data-Architect-Handbook and infracost/infracost source code examples are useful. See the Terraform Example section for further details.

For Azure Resource Manager, the nisinha/cicd, praveenmathamsetty/bigdata and debhol/azuredocs source code examples are useful. See the Azure Resource Manager Example section for further details.