For this section, we assume you have either created a connection to a Processing Platform or a Data Catalog or you are familiar with the structure of your source data. You will, naturally, also need to have an instance of PACE running. We give a step-by-step walkthrough on how to create a Data Policy.
For this example we will use a very small sample data set:
email
age
alice@domain.com
16
bob@company.org
18
charlie@store.io
20
Get blueprint policy
Let the id of your connected Processing Platform be pace-platform and the source reference be pace-db.pace-schema. pace-table. Getting a blueprint policy yields a nearly empty Data Policy with populated source fields, thus defining the structure of your source data. In this example we will use both the pace cli and call the REST API using curl. The request has two variables, platform-id and table-id. The CLI makes use of gRPC. We assume here that the instance is running on localhost, the gRPC port is 50051 and the envoy proxy is listening on port 9090 as per the defaults. For the CLI holds, that if you specify the processing platform and the reference to the table, the corresponding blueprint policy is returned. Requesting a blueprint policy then yields:
As you can see, the Rule Sets are empty and only the refs and source.fields have been populated. This dataset consists of 2 columns: email, age. Now that we know our data's structure, the next step is populating the yaml with a Rule Set.
If you do not have a Processing Platform or Data Catalog connected yet, use the blueprint policy from above to define the structure of your data by hand.
Define Rule Sets
In this example we will show one Rule Set, with one Field Transform and one Filter. If you define multiple Rule Sets in your data policy, multiple views will be created. For a more extensive explanation on Rule Sets, we provide more in-depth documentation on Principals, Field Transforms and Filters.
Target
Let's start by defining the reference to the target table. We have chosen the target schema to be the same as the source schema. The Target should then be defined as
We define one Field Transform and add it to the Rule Set. Our transform concerns the email field. The field definition is corresponding to the same field in the source fields. We define a Transform for the Marketing (MKTNG) principal and one for the Fraud and Risk (F&R) principal. For the Marketing principal the local-part of the email is replaced by **** while leaving the @ and the domain as-is, using a regular expression with replacement. For the Fraud and Risk principal we apply an identity transform, returning the email as-is. Finally, if the viewer is not a member of either of these Principals, a fixed value **** is returned instead of the email. More guidance and examples on how to define Transforms see the docs.
To completely filter out rows, we here define one Filter based on the age field. Note that the condition can contain any arbitrary logic with any number of fields. For the Fraud and Risk principal we return all rows whereas for all other principals we exclude all children from the target view. More details on Filters can be found here.
Assuming we have saved the policy as data_policy.json or data_policy.yaml in your current working directory, we can upsert the Data Policy. The CLI accepts both YAML and JSON, curl only JSON:
By adding the --apply flag, the Data Policy is applied immediately, and so is therefore the corresponding SQL view. It is possible to first upsert and only later apply a policy, in which case the ID and platform of the upserted policy must be provided:
pace apply data-policy your-data-policy-id -p your-platform-id
View data
Depending on what Principal groups you are in, you will find that the actual data you have access to via the newly created view differs. Below you will find again the raw data and the views applied for several Principals.