As mentioned in the BigQuery integration docs, we offer two methods to control table access. In this tutorial we will look at an example with the BigQuery IAM Check Extension. Given you have the extension set up correctly (currently, this extension is not open source, but you can request access), this tutorial will show you an example on how to leverage the IAM groups, roles and permissions.
Prerequisite
For this tutorial to work you need to create a service account according to .
File and directory setup
Clone the repository from GitHub. This command assumes you're not using SSH, but feel free to do so.
gitclonehttps://github.com/getstrm/pace.git
Create a directory standalone with the following directory tree structure:
Now navigate to the bigquery-iam-check-extension directory inside the newly create pace folder:
cdpace/examples/bigquery-iam-check-extension
Next, let's have a look at the contents of these files.
employees.csv
The CSV file containing some mock-up data for this example. This tutorial requires you to create a dataset named pace in your BigQuery project, with a table named demo. Populate the table with the CSV.
config/application.yaml
Fill your BigQuery project ID and paste your service account json key in the corresponding places.
docker-compose.yaml
The compose file contains two services:
pace_app with all ports exposed to the host for all different interfaces (REST, gRPC, and directly to the ):
8080 -> Spring Boot Actuator
9090 -> Envoy JSON / gRPC Transcoding proxy
50051 -> gRPC
postgres_pace acts as the persistent layer for PACE to store its Data Policies
Available under localhost:5432 on your machine.
data-policy.yaml
This is the Data Policy we'll be creating in this tutorial. It implements the desired sensitivity measures, which we will see below.
Creating service accounts as test users
For this example, we will make use of a set of user groups with service accounts as test users. First create three groups within your organization, name testgroup1 and testgroup2, testgroup3 and configure the group email as testgroup1@your-domain.com, testgroup2@your-domain.comand testgroup3@your-domain.com, respectively.
Role creation
Go to the IAM page and grant the following roles to all three groups: BigQuery Data Viewer and BigQuery Job User. These two roles are mandatory to be able to query the created view in BigQuery.
Another option for the roles is to create a custom role with the following permissions:
bigquery.jobs.create
bigquery.routines.get
bigquery.connections.use
These are the minimal right to invoke the check_principal_access routine that is created by the extension. More role settings are needed for the view, but more on that later.
Service account creation
Create three service accounts with names testuser1, testuser2 and testuser3 and assign them to their respective groups.
Running the BigQuery Example
If you have created the table in BigQuery with the employees data and filled out your service account json key and project id in the application yaml, it is time to start the containers:
dockercomposeup
The Data Policy
Replace all principal domains to the domain of your principals. Let's break down the policy.
Ruleset
There is only one target in this ruleset, hence we will be creating one view.
We are creating a SQL view. Replace the <--PROJECT-ID-->, <--DATABASE--> and <--VIEW--> with your project id, target database and target view name respectively.
One generic filter block. testgroup1 can view all rows, testgroup2 can view only rows where age is 40. All other principals, i.e. testgroup3, can only see people older than 50.
In the source.ref section, change the integration_fqn to the correct reference to the source table.
Applying the data policy and setting roles
Now, using the pace CLI, upsert and apply the data policy:
paceupsertdata-policydata-policy.yaml--apply
In your BigQuery studio you should be able to see the view you just created. Now in order to be able to query the view we need to set the BigQuery Data Viewer role on the view to the test groups. Any other role that lets the test groups query the view are also sufficient.
Querying the view
Using the gcloud and bq command line interfaces, we can impersonate the service accounts and query the views. In the tabs below, you can find the different results for the different principals
If you are logged in locally as a user that has access to the dataset, use the bq cli to query the original table.
bqquery--nouse_cache--use_legacy_sql=false'select Employee_ID, Name, Base_City, Base_Country, Department, Years_of_Experience, Salary__USD_, Age, IBAN from sales.employees order by Employee_ID limit 10;'+-------------+--------------------+-----------+--------------+------------+---------------------+--------------+-----+--------------------+|Employee_ID|Name|Base_City|Base_Country|Department|Years_of_Experience|Salary__USD_|Age|IBAN|+-------------+--------------------+-----------+--------------+------------+---------------------+--------------+-----+--------------------+|E1056|JamesRichards|Rotterdam|Netherlands|HR|18|68754|35|NL98VXTS9541531481||E1109|ElizabethSantiago|Singapore|Singapore|HR|19|89036|42|IN55SFAG6135789633||E1227|JamieHodges|Aberdeen|UK|Logistics|19|60426|48|IN59PRFC9669490582||E1322|KeithKing|Houston|USA|Finance|4|74996|27|DE96HCDB4822669380||E1335|JustinForbes|London|UK|Finance|6|60861|26|IN69ZXWR9447134304||E1481|CharlesBarrera|Houston|USA|Finance|16|71468|46|IN72VVYY5857043010||E1507|MichaelGarcia|London|UK|Operations|9|79023|41|GB25ZPNV4859942021||E1605|苏丽|London|UK|HR|20|79668|27|IN82BQQK5940726608||E1665|JenniferBrooks|London|UK|Marketing|3|85358|28|GB70CIZL1935002050||E1677|WesleyMonroe|Rotterdam|Netherlands|Operations|10|61221|33|FR35POMV0305660191|+-------------+--------------------+-----------+--------------+------------+---------------------+--------------+-----+--------------------+
To impersonate the service account run the following gcloud command.