Google BigQuery Dataset
This page shows how to write Terraform for BigQuery Dataset and write them securely.
google_bigquery_dataset (Terraform)
The Dataset in BigQuery can be configured in Terraform with the resource name google_bigquery_dataset
. The following sections describe 5 examples of how to use the resource and its parameters.
Example Usage from GitHub
resource "google_bigquery_dataset" "dataset1" {
dataset_id = var.dataset1_id # "example_dataset1"
friendly_name = var.dataset1_friendly_name # "dataset1"
description = var.dataset1_desc # "This is a dataset1 description"
location = var.dataset1_location # "EU"
default_table_expiration_ms = 3600000
resource "google_bigquery_dataset" "events_dataset" {
dataset_id = "slack_events"
friendly_name = "slack events"
description = "A dataset containing all slack events from Data Minded"
location = "EU"
}
resource "google_bigquery_dataset" "usage" {
dataset_id = "example_dataset"
description = "This is a test description"
}
resource "google_bigquery_dataset" "non_usage" {
resource "google_bigquery_dataset" "big_data" {
dataset_id = "market_dataset"
description = "This is a test description"
location = var.region
labels = {
resource "google_bigquery_dataset" "dataset" {
dataset_id = "example_dataset"
friendly_name = "test"
description = "This is a test description"
location = "US"
default_table_expiration_ms = 3600000
Security Best Practices for google_bigquery_dataset
There is 1 setting in google_bigquery_dataset that should be taken care of for security reasons. The following section explain an overview and example code.
Ensure your BigQuery dataset blocks unwanted access
It is better to block unwanted access from users outside the organization.
Parameters
-
creation_time
optional computed - number
The time when this dataset was created, in milliseconds since the epoch.
-
dataset_id
required - string
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
-
default_partition_expiration_ms
optional - number
The default partition expiration for all partitioned tables in the dataset, in milliseconds. Once this property is set, all newly-created partitioned tables in the dataset will have an 'expirationMs' property in the 'timePartitioning' settings set to this value, and changing the value will only affect new tables, not existing ones. The storage in a partition will have an expiration time of its partition time plus this value. Setting this property overrides the use of 'defaultTableExpirationMs' for partitioned tables: only one of 'defaultTableExpirationMs' and 'defaultPartitionExpirationMs' will be used for any new partitioned table. If you provide an explicit 'timePartitioning.expirationMs' when creating or updating a partitioned table, that value takes precedence over the default partition expiration time indicated by this property.
-
default_table_expiration_ms
optional - number
The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). Once this property is set, all newly-created tables in the dataset will have an 'expirationTime' property set to the creation time plus the value in this property, and changing the value will only affect new tables, not existing ones. When the 'expirationTime' for a given table is reached, that table will be deleted automatically. If a table's 'expirationTime' is modified or removed before the table expires, or if you provide an explicit 'expirationTime' when creating a table, that value takes precedence over the default expiration time indicated by this property.
-
delete_contents_on_destroy
optional - bool -
description
optional - string
A user-friendly description of the dataset
-
etag
optional computed - string
A hash of the resource.
-
friendly_name
optional - string
A descriptive name for the dataset
The labels associated with this dataset. You can use these to organize and group your datasets
-
last_modified_time
optional computed - number
The date when this dataset or any of its tables was last modified, in milliseconds since the epoch.
-
location
optional - string
The geographic location where the dataset should reside. See official docs. There are two types of locations, regional or multi-regional. A regional location is a specific geographic place, such as Tokyo, and a multi-regional location is a large geographic area, such as the United States, that contains at least two geographic places. The default value is multi-regional location 'US'. Changing this forces a new resource to be created.
-
project
optional computed - string -
self_link
optional computed - string -
access
set block-
domain
optional - string
A domain to grant access to. Any users signed in with the domain specified will be granted the specified access
-
group_by_email
optional - string
An email address of a Google Group to grant access to.
-
role
optional - string
Describes the rights granted to the user specified by the other member of the access object. Basic, predefined, and custom roles are supported. Predefined roles that have equivalent basic roles are swapped by the API to their basic counterparts. See official docs.
-
special_group
optional - string
A special group to grant access to. Possible values include: 'projectOwners': Owners of the enclosing project. 'projectReaders': Readers of the enclosing project. 'projectWriters': Writers of the enclosing project. 'allAuthenticatedUsers': All authenticated BigQuery users.
-
user_by_email
optional - string
An email address of a user to grant access to. For example: fred@example.com
-
view
list block-
dataset_id
required - string
The ID of the dataset containing this table.
-
project_id
required - string
The ID of the project containing this table.
-
table_id
required - string
The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
-
-
-
default_encryption_configuration
list block-
kms_key_name
required - string
Describes the Cloud KMS encryption key that will be used to protect destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.
-
-
timeouts
single block
Explanation in Terraform Registry
Datasets allow you to organize and control access to your tables. To get more information about Dataset, see:
- API documentation
- How-to Guides
- Datasets Intro
Warning: You must specify the role field using the legacy format
OWNER
instead ofroles/bigquery.dataOwner
. The API does accept both formats but it will always return the legacy format which results in Terraform showing permanent diff on each plan and apply operation.
Frequently asked questions
What is Google BigQuery Dataset?
Google BigQuery Dataset is a resource for BigQuery of Google Cloud Platform. Settings can be wrote in Terraform.
Where can I find the example code for the Google BigQuery Dataset?
For Terraform, the xgenOsama/gcp-terraform-modules, datamindedbe/blog-slack-bigquery-export and infracost/infracost source code examples are useful. See the Terraform Example section for further details.