BigQuery
Connecting PACE to a BigQuery instance
Last updated
Connecting PACE to a BigQuery instance
Last updated
PACE translates Data Policies to authorized views. This requires certain privileges. For the use of Principals, we offer two options. The first (and easy) way, using a user group mapping table. The second way by leveraging the Google Groups, Roles and Permissions from IAM. Follow the below sections to connect PACE to a BigQuery instance.
In a standard GCP deployment, PACE's integration will look like this:
PACE creates authorized views so that users of these views do not need read access to the underlying data, which would defeat the purpose of controlling access and policies through PACE Data Policies.
We advice creating a new dedicated service account to connect PACE to your BigQuery environment. Also create a JSON key for this account and make sure to store it securely.
The easiest way to configure the required privileges is to grant the PACE service account the DataOwner
role for all source and target datasets, as well as the JobUser
role, but a more fine-grained setup is possible:
Grant the service account the JobUser
role on the BigQuery project you want the PACE queries to run in (i.e. the create view queries/jobs).
Grant the permissions required for creating authorized views (see also https://cloud.google.com/bigquery/docs/authorized-views):
bigquery.tables.create
on the target datasets.
bigquery.tables.getData
on the source datasets.
bigquery.datasets.get
and bigquery.datasets.update
on the source datasets.
To keep track of the group Principals a user belongs to, create a table (or view) with the following string fields: userEmail
and userGroup
.
A user may appear in multiple rows, one per group. Changes to this table will immediately affect the query results on PACE authorized views for the modified users.
The table must be located in the same region where the PACE views are created.
After following the above steps, provide the corresponding configuration to the PACE application for each BigQuery instance you want to connect with. For example:
The properties are expected to contain the following:
id
: an arbitrary identifier unique within your organization for the specific platform (BigQuery).
projectId
: the google cloud project id where PACE should execute its queries (may differ from source/target datasets).
userGroupsTable
: the full name of the table containing the user group mapping, i.e. <project>.<dataset>.<table>
.
serviceAccountJsonKey
: the JSON key created for the service account to be used by PACE.
We offer two extensions for Principal checks: Google IAM Sync
and Google IAM Check
. The Sync
extension provides an automated way of keeping the user mapping table in sync, whereas the Check
extension is a standalone check that leverages all IAM groups, roles and permissions for your Google project. Both are described in more detail below.
The extensions share a lot of prerequisites. Follow the steps below to configure your Google project properly.
First of all, we need to "trust" the Google Auth Library
. This requires a super-admin account as you need to login to the admin console. Click Add App and select based on Client-ID.
The corresponding app-id is 764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com
.
Complete the wizard to make it a trusted app.
Next up, make sure that the following APIs are enabled:
Cloud Resource Manager (Check
extension only)
Cloud Asset (Check
extension only)
The BigQuery IAM Extensions make use of the Application Default Credentials
for Google. You need to create OAuth credentials for a Desktop application in the Google Cloud Console:
Go to the APIs & Services console, make sure you select the correct project
Click on Create Credentials
and select OAuth client ID
Select Desktop application
as the application type
Click on Create
and download the credentials file
The extension is managed by Terraform. In order to create the Terraform resources, log in locally as the super-admin account with the --client-id-file
flag set to the OAuth credentials file and the --scopes
flag with the following scopes: https://www.googleapis.com/auth/admin.directory.rolemanagement
, https://www.googleapis.com/auth/admin.directory.rolemanagement.readonly
https://www.googleapis.com/auth/cloud-platform
for example:
After login set the quota project you want to use:
The Sync
extension requires the following environment variables to be set. We recommend to use an .envrc
like below:
REGION
is the region where your data lives
PROJECT
is the project where your data lives
ORGANIZATION_ID
for example: 12345689012
CUSTOMER_ID
is the customer-id of the organization in the Google admin console.
SCHEDULER_REGION
is the region where the cloud function will be deployed. This could potentially be the same as the region
variable, but cloud scheduler is not available in all regions. Check if your region is available here.
CRON_SCHEDULE
is the schedule for the cloud scheduler in cron format. For example, 0 0 * * *
would invoke every day at midnight.
Unless you have defined differently, the pace application follows the user mappings table config with user-groups-table: "my-google-cloud-project.user_groups.user_groups_view"
. The view that is created is an authorized view that will only return the groups of the user in session.
The Check
extension requires only four environment variables. We recommend using an .envrc
:
REGION
is the region where your data lives
PROJECT
is the project where your data lives
ORGANIZATION_ID
for example: 12345689012
CUSTOMER_ID
is the customer-id of the organization in the Google admin console.
Compared to the user mapping table, the only difference is that you must set the use-iam-check-extension
parameter to true
. The user-groups-table
can then be left empty.
In order to leverage the three different IAM types, you need to indicate the type of principal.
Principal | Example (in YAML) |
---|---|
Groups | |
Roles | |
Permissions |
If you are interested in using the IAM groups, roles and permissions as Principals for your data policies, let us know!