Google BigQuery Dataset

This page shows how to write Terraform for BigQuery Dataset and write them securely.

google_bigquery_dataset (Terraform)

The Dataset in BigQuery can be configured in Terraform with the resource name google_bigquery_dataset. The following sections describe 5 examples of how to use the resource and its parameters.

Example Usage from GitHub

main.tf#L1
resource "google_bigquery_dataset" "dataset1" {
  dataset_id                  = var.dataset1_id # "example_dataset1"
  friendly_name               = var.dataset1_friendly_name # "dataset1"
  description                 = var.dataset1_desc # "This is a dataset1 description"
  location                    = var.dataset1_location # "EU"
  default_table_expiration_ms = 3600000
bigquery.tf#L1
resource "google_bigquery_dataset" "events_dataset" {
  dataset_id                  = "slack_events"
  friendly_name               = "slack events"
  description                 = "A dataset containing all slack events from Data Minded"
  location                    = "EU"
}
bigquery_dataset_test.tf#L6
resource "google_bigquery_dataset" "usage" {
  dataset_id  = "example_dataset"
  description = "This is a test description"
}

resource "google_bigquery_dataset" "non_usage" {
bq.tf#L3
resource "google_bigquery_dataset" "big_data" {
  dataset_id  = "market_dataset"
  description = "This is a test description"
  location    = var.region

  labels = {
bigquery.tf#L1
resource "google_bigquery_dataset" "dataset" {
  dataset_id                  = "example_dataset"
  friendly_name               = "test"
  description                 = "This is a test description"
  location                    = "US"
  default_table_expiration_ms = 3600000

Review your Terraform file for Google best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

Security Best Practices for google_bigquery_dataset

There is 1 setting in google_bigquery_dataset that should be taken care of for security reasons. The following section explain an overview and example code.

risk-label

Ensure your BigQuery dataset blocks unwanted access

It is better to block unwanted access from users outside the organization.

Review your Google BigQuery settings

You can check if the google_bigquery_dataset setting in your .tf file is correct in 3 min with Shisho Cloud.

Parameters

The time when this dataset was created, in milliseconds since the epoch.

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

The default partition expiration for all partitioned tables in the dataset, in milliseconds. Once this property is set, all newly-created partitioned tables in the dataset will have an 'expirationMs' property in the 'timePartitioning' settings set to this value, and changing the value will only affect new tables, not existing ones. The storage in a partition will have an expiration time of its partition time plus this value. Setting this property overrides the use of 'defaultTableExpirationMs' for partitioned tables: only one of 'defaultTableExpirationMs' and 'defaultPartitionExpirationMs' will be used for any new partitioned table. If you provide an explicit 'timePartitioning.expirationMs' when creating or updating a partitioned table, that value takes precedence over the default partition expiration time indicated by this property.

The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). Once this property is set, all newly-created tables in the dataset will have an 'expirationTime' property set to the creation time plus the value in this property, and changing the value will only affect new tables, not existing ones. When the 'expirationTime' for a given table is reached, that table will be deleted automatically. If a table's 'expirationTime' is modified or removed before the table expires, or if you provide an explicit 'expirationTime' when creating a table, that value takes precedence over the default expiration time indicated by this property.

A user-friendly description of the dataset

  • etag optional computed - string

A hash of the resource.

A descriptive name for the dataset

  • id optional computed - string
  • labels optional - map from string to string

The labels associated with this dataset. You can use these to organize and group your datasets

The date when this dataset or any of its tables was last modified, in milliseconds since the epoch.

The geographic location where the dataset should reside. See official docs. There are two types of locations, regional or multi-regional. A regional location is a specific geographic place, such as Tokyo, and a multi-regional location is a large geographic area, such as the United States, that contains at least two geographic places. The default value is multi-regional location 'US'. Changing this forces a new resource to be created.

  • project optional computed - string
  • self_link optional computed - string
  • access set block

    A domain to grant access to. Any users signed in with the domain specified will be granted the specified access

    An email address of a Google Group to grant access to.

    Describes the rights granted to the user specified by the other member of the access object. Basic, predefined, and custom roles are supported. Predefined roles that have equivalent basic roles are swapped by the API to their basic counterparts. See official docs.

    A special group to grant access to. Possible values include: 'projectOwners': Owners of the enclosing project. 'projectReaders': Readers of the enclosing project. 'projectWriters': Writers of the enclosing project. 'allAuthenticatedUsers': All authenticated BigQuery users.

    An email address of a user to grant access to. For example: fred@example.com

    • view list block

      The ID of the dataset containing this table.

      The ID of the project containing this table.

      The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

  • default_encryption_configuration list block

    Describes the Cloud KMS encryption key that will be used to protect destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.

  • timeouts single block

Explanation in Terraform Registry

Datasets allow you to organize and control access to your tables. To get more information about Dataset, see:

  • API documentation
  • How-to Guides
    • Datasets Intro

      Warning: You must specify the role field using the legacy format OWNER instead of roles/bigquery.dataOwner. The API does accept both formats but it will always return the legacy format which results in Terraform showing permanent diff on each plan and apply operation.

Frequently asked questions

What is Google BigQuery Dataset?

Google BigQuery Dataset is a resource for BigQuery of Google Cloud Platform. Settings can be wrote in Terraform.

Where can I find the example code for the Google BigQuery Dataset?

For Terraform, the xgenOsama/gcp-terraform-modules, datamindedbe/blog-slack-bigquery-export and infracost/infracost source code examples are useful. See the Terraform Example section for further details.

security-icon

Automate config file reviews on your commits

Fix issues in your infrastructure as code with auto-generated patches.