Top / Google Cloud Platform / Google BigQuery / Dataset

Google BigQuery Dataset

This page shows how to write Terraform for BigQuery Dataset and write them securely.

Review your .tf file for Google best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

google_bigquery_dataset (Terraform)

The Dataset in BigQuery can be configured in Terraform with the resource name google_bigquery_dataset. The following sections describe 5 examples of how to use the resource and its parameters.

Example Usage from GitHub

xgenOsama/gcp-terraform-modules

main.tf#L1

resource "google_bigquery_dataset" "dataset1" {
  dataset_id                  = var.dataset1_id # "example_dataset1"
  friendly_name               = var.dataset1_friendly_name # "dataset1"
  description                 = var.dataset1_desc # "This is a dataset1 description"
  location                    = var.dataset1_location # "EU"
  default_table_expiration_ms = 3600000

Find out how to use this setting securely with Shisho Cloud

datamindedbe/blog-slack-bigquery-export

bigquery.tf#L1

resource "google_bigquery_dataset" "events_dataset" {
  dataset_id                  = "slack_events"
  friendly_name               = "slack events"
  description                 = "A dataset containing all slack events from Data Minded"
  location                    = "EU"
}

Find out how to use this setting securely with Shisho Cloud

infracost/infracost

bigquery_dataset_test.tf#L6

resource "google_bigquery_dataset" "usage" {
  dataset_id  = "example_dataset"
  description = "This is a test description"
}

resource "google_bigquery_dataset" "non_usage" {

Find out how to use this setting securely with Shisho Cloud

tranquilitybase-io/tb-activator-gft-datazone

bq.tf#L3

resource "google_bigquery_dataset" "big_data" {
  dataset_id  = "market_dataset"
  description = "This is a test description"
  location    = var.region

  labels = {

Find out how to use this setting securely with Shisho Cloud

anaik91/tfe

bigquery.tf#L1

resource "google_bigquery_dataset" "dataset" {
  dataset_id                  = "example_dataset"
  friendly_name               = "test"
  description                 = "This is a test description"
  location                    = "US"
  default_table_expiration_ms = 3600000

Find out how to use this setting securely with Shisho Cloud

Review your Terraform file for Google best practices

Shisho Cloud, our free checker to make sure your Terraform configuration follows best practices, is available (beta).

Security Best Practices for google_bigquery_dataset

There is 1 setting in google_bigquery_dataset that should be taken care of for security reasons. The following section explain an overview and example code.

Ensure your BigQuery dataset blocks unwanted access

It is better to block unwanted access from users outside the organization.

Review your Google BigQuery settings

You can check if the google_bigquery_dataset setting in your .tf file is correct in 3 min with Shisho Cloud.

Parameters

creation_time optional computed - number

The time when this dataset was created, in milliseconds since the epoch.

dataset_id required - string

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.

default_partition_expiration_ms optional - number

The default partition expiration for all partitioned tables in the dataset, in milliseconds. Once this property is set, all newly-created partitioned tables in the dataset will have an 'expirationMs' property in the 'timePartitioning' settings set to this value, and changing the value will only affect new tables, not existing ones. The storage in a partition will have an expiration time of its partition time plus this value. Setting this property overrides the use of 'defaultTableExpirationMs' for partitioned tables: only one of 'defaultTableExpirationMs' and 'defaultPartitionExpirationMs' will be used for any new partitioned table. If you provide an explicit 'timePartitioning.expirationMs' when creating or updating a partitioned table, that value takes precedence over the default partition expiration time indicated by this property.

default_table_expiration_ms optional - number

The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). Once this property is set, all newly-created tables in the dataset will have an 'expirationTime' property set to the creation time plus the value in this property, and changing the value will only affect new tables, not existing ones. When the 'expirationTime' for a given table is reached, that table will be deleted automatically. If a table's 'expirationTime' is modified or removed before the table expires, or if you provide an explicit 'expirationTime' when creating a table, that value takes precedence over the default expiration time indicated by this property.

delete_contents_on_destroy optional - bool
description optional - string

A user-friendly description of the dataset

etag optional computed - string

A hash of the resource.

friendly_name optional - string

A descriptive name for the dataset

id optional computed - string
labels optional - map from string to string

The labels associated with this dataset. You can use these to organize and group your datasets

last_modified_time optional computed - number

The date when this dataset or any of its tables was last modified, in milliseconds since the epoch.

location optional - string

The geographic location where the dataset should reside. See official docs. There are two types of locations, regional or multi-regional. A regional location is a specific geographic place, such as Tokyo, and a multi-regional location is a large geographic area, such as the United States, that contains at least two geographic places. The default value is multi-regional location 'US'. Changing this forces a new resource to be created.

project optional computed - string
self_link optional computed - string
access set block
- domain optional - string
A domain to grant access to. Any users signed in with the domain specified will be granted the specified access
- group_by_email optional - string
An email address of a Google Group to grant access to.
- role optional - string
Describes the rights granted to the user specified by the other member of the access object. Basic, predefined, and custom roles are supported. Predefined roles that have equivalent basic roles are swapped by the API to their basic counterparts. See official docs.
- special_group optional - string
A special group to grant access to. Possible values include: 'projectOwners': Owners of the enclosing project. 'projectReaders': Readers of the enclosing project. 'projectWriters': Writers of the enclosing project. 'allAuthenticatedUsers': All authenticated BigQuery users.
- user_by_email optional - string
An email address of a user to grant access to. For example: fred@example.com
- view list block
  - dataset_id required - string
  The ID of the dataset containing this table.
  - project_id required - string
  The ID of the project containing this table.
  - table_id required - string
  The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
default_encryption_configuration list block
- kms_key_name required - string
Describes the Cloud KMS encryption key that will be used to protect destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.
timeouts single block
- create optional - string
- delete optional - string
- update optional - string

>> from Terraform Registry

Explanation in Terraform Registry

Datasets allow you to organize and control access to your tables. To get more information about Dataset, see:
API documentation
How-to Guides
Datasets Intro
Warning: You must specify the role field using the legacy format OWNER instead of roles/bigquery.dataOwner. The API does accept both formats but it will always return the legacy format which results in Terraform showing permanent diff on each plan and apply operation.

>> from Terraform Registry

The Other Related Google BigQuery Resources

Google BigQuery Dataset Access

Google BigQuery Dataset IAM

Google BigQuery Job

Google BigQuery Routine

Google BigQuery Table

Google BigQuery Table IAM

Frequently asked questions

What is Google BigQuery Dataset?

Google BigQuery Dataset is a resource for BigQuery of Google Cloud Platform. Settings can be wrote in Terraform.

Where can I find the example code for the Google BigQuery Dataset?

For Terraform, the xgenOsama/gcp-terraform-modules, datamindedbe/blog-slack-bigquery-export and infracost/infracost source code examples are useful. See the Terraform Example section for further details.

Automate config file reviews on your commits

Fix issues in your infrastructure as code with auto-generated patches.

google_bigquery_dataset
Frequently asked questions

Google BigQuery Dataset

Review your .tf file for Google best practices

google_bigquery_dataset (Terraform)

Example Usage from GitHub

Review your Terraform file for Google best practices

Security Best Practices for google_bigquery_dataset

Ensure your BigQuery dataset blocks unwanted access

Review your Google BigQuery settings

Parameters

Explanation in Terraform Registry

The Other Related Google BigQuery Resources

Frequently asked questions

What is Google BigQuery Dataset?

Where can I find the example code for the Google BigQuery Dataset?

Automate config file reviews on your commits

Table of Contents