Getting started
Generate dbt models for your (existing) dbt projects, secured by PACE Data Policies
If you use dbt for your data processing, you may prefer to use the PACE dbt module. Instead of storing Data Policies and applying them directly on your data platform(s), like the PACE server application does, PACE dbt adds models to your dbt project which implement the desired policies. Dbt then generates views for these PACE dbt models, as it would for any other dbt model. PACE dbt is not a continuously running server application, but a command line executable, which can be integrated in your local development or CI/CD workflow. Read on to find out how to get started with PACE dbt.
Prerequisites
A dbt project with a target data processing platform (typically the output type specified in your dbt project's
profiles.yml
file) supported by PACE. See Processing Platform Integrationsfor an overview of supported platforms.The dbt cli (you should be able to run
dbt compile
anddbt run
).The PACE dbt executable JAR file (
dbt.jar
), which you can find under a release's Assets on the Releases page.A (basic) understanding of PACE's Field Transforms and Filters, see Rule Set.
The PACE dbt jar
requires a JVM to run. There are various ways to install a JVM on your local machine, please refer to the official Java site for installation instructions for your platform. If you prefer to use a package manager to install Java, that will do as well. The JRE version is enough for PACE dbt to be executed, there is no need for the JDK.
Defining Data Policies
PACE dbt uses the same Rule Set yaml notation for Field Transforms as the PACE server application. Any example yaml shown elsewhere in the docs is therefore applicable to PACE dbt.
An exception is the fixed field transform. Due to dbt models typically not explicitly specifying column data types, PACE dbt cannot fully infer whether the provided fixed replacement value matches the original data type. Therefore, use literal values whenever data type ambiguity arises, to distinguish between for example a numeric value (value: "42"
) or a string value (value: "'42'"
, note the additional single quotes, which are SQL syntax).
Let's see how to enrich a dbt model with PACE Data Policies.
Adding Rule Sets to a model's meta section
PACE dbt looks for rule sets in a pace_rule_sets
section under the meta
sections of dbt models. With this approach, the eventual yaml will look quite similar to that used with the PACE server. It is also the way to go when multiple secure views are desired for a single model, each with a different rule set.
Let's start with a basic dbt schema file:
This dim_transactions
model contains sensitive data dat we want to protect with a PACE Data Policy. Let's say that we want the following:
The
userid
should be nullified for all users, except foradministrator
s.The
email
should be masked to only show the domain, except foradministrator
s.Administrator
s can see all records, but everyone else can only see records for users with anage
greater than 18.
A modified model schema.yml
file that achieves this would look like (see the comments for further explanation):
Specifying Field Transforms on column meta sections
Alternatively, you can specify field transforms in the meta section of the respective columns. This reduces the total amount of yaml and may provide more overview.
When specifying transforms under the meta section of at least one column, PACE dbt will ignore any field transforms under the model meta section, and only use the target and filters from the first rule set listed there (if any). Therefore, when using the column meta sections, only put field transforms there.
The following yaml would result in an identical generated PACE dbt model. As you can see, we simply lifted the transforms sections out of the field transforms sections from the previous example, and placed them in the respective columns' meta section under a pace_transforms
key.
If no filters are needed, and the PACE dbt model's relation may be created in the same schema as the source model, using the default PACE dbt suffix, the yaml can be simplified by removing the model's entire meta.pace
section. Note that in the following example, we also removed the explicit principals
fields for the fallback cases (i.e. "everyone else").
If you wish to include a hash transform using dbt, it is required to define the data type of the field in the schema.yml
Additional configuration for BigQuery
This is best done in the model meta section of the dbt_project.yml
file, such as:
Generating PACE dbt models
Now that you know how to define PACE Data Policies in your dbt model metadata, let's proceed with generating the corresponding models with PACE dbt. Here we will be using the first example policy listed above (functionally identical to the second example).
From the root directory of your dbt project, execute the following steps:
Execute a
dbt compile
, ordbt run
. This will update dbt'starget/manifest.json
file, which is what PACE dbt uses as input.Execute the PACE dbt jar file. Assuming the jar file is called
dbt.jar
and is located in the parent directory of the dbt project, it would look as follows:java -jar ../dbt.jar
. This should result in one line being outputted per model with a PACE dbt policy configuration, similar to:Generated PACE model models/demo_transactions.sql
.Resume your dbt workflow as usual, e.g. with
dbt compile
ordbt run
. Dbt will create the corresponding relations on your target data platform. You may want to first inspect the generated model SQL, or add some metadata for it in aschema.yml
file.
To use PACE dbt in a dockerized (Python) environment, see Containerized dbt module for an example Dockerfile.
Example generated SQL
The model generated by PACE dbt will look similar to the following (this SQL query is specific to BigQuery, other target data platforms will result in slightly different statements):
Using a different target db and/or schema
As shown in the example, PACE dbt will add the required config keys if the specified target integration_fqn
uses a different db and/or schema than the source model. Note that the default dbt behaviour regarding custom schemas still applies.
Explicitly enabling or disabling PACE dbt configuration
By adding a pace_enabled
boolean key to a model or column meta section, transforms or filters can be explicitly enabled or disabled. When the pace_enabled
key is absent, but a pace_rule_sets
or pace_transforms
key is present, it is implicitly considered to be true
. By adding pace_enabled: false
, the respective configuration will be excluded from the generated PACE dbt model. No model will be generated if all PACE configuration is disabled.
You can "force" the creation of a PACE dbt model without any field transforms or filters by just adding a pace_enabled: true
key to a model's meta section. By adding it to your general model metadata in dbt_project.yml
, you can do this for all your models by default (and override that again on model level):
Models generated by PACE dbt are always ignored when generating models, even when pace_enabled
is set to true
on project level.
Deleting PACE dbt models
Completely removing the pace
section(s) of a model's meta
and/or column meta
sections will result in no PACE dbt being generated for the respective model anymore upon the next run. The previously generated PACE dbt model will however remain in the dbt project and needs to be removed manually. As with any dbt model, the corresponding relations in your data processing platform need to be dropped manually, as dbt itself does not delete them when their model files are removed.
Last updated