This tutorial assumes that you have completed the quickstart section of the docs. The prerequisites for this tutorial are the same as mentioned there.
The goal of this tutorial is to be able to fetch a Data Policy with a ruleset included, based on tags that are attached to a field on the respective processing platform. This enables the end-user to define transforms once, and reuse them without leaving the data processing platform or data catalog.
File and directory setup
Clone the repository from GitHub, if you haven't already done so. This command assumes you're not using SSH, but feel free to do so.
gitclonehttps://github.com/getstrm/pace.git
Create a directory global-tag-transforms with the following directory tree structure:
Now navigate to the global-tag-transforms directory inside the pace repo:
cdpace/examples/global-tag-transforms
Next, let's have a look at the contents of these files.
The compose file is set up without any persistence of data across different startups of the services. Keep in mind that any changes to the data will be persisted for as long as you keep the services running.
pace_app with ports for all different interfaces exposed to the host:
8080 -> Spring Boot Actuator.
9090 -> Envoy JSON / gRPC REST Transcoding proxy.
50051 -> gRPC.
postgres_pace acts as the persistent layer for PACE to store its Data Policies.
Available under localhost:5432 on your machine.
postgres_processing_platform is the pre-populated database.
Available under localhost:5431 on your machine.
data.sql
The PostgreSQL initialization SQL script that is run on startup for the postgres_processing_platform container. The database is configured to use the public schema (the default), with the following data:
A table called public.demo, for the data schema, please see the file contents.
A comment on the email field of the public.demo table, that includes the tag pace::pii-email.
config/application.yaml
This is the Spring Boot application configuration, which specifies the PACE database connection, and Processing Platform.
Here, PACE is configured to connect to the postgres_pace host, which is the name of the Docker network that both the pace_app and the postgres_pace containers are configured with.
Furthermore, one Processing Platform of type postgres is configured, named global_transforms-sample-connection.
global-tag-transforms.yaml
This is the Global Transform we'll be creating in this tutorial. It uses tags on the processing platform, which are attached to fields in the data schema, and therefore, is of type tag_transform.
A global transform allows for specifying multiple transforms based on the principal of the data consumer, similar to how it's done when creating rule sets. In fact, this transform is translated into a rule set of its own, as we'll see later in the tutorial.
Running the example
Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:
dockercomposeup--pullalways
There should be quite a bit of logging, ending in the banner of the PACE app booting. Once everything has started, try to connect to the postgres_processing_platform to view the table and it's comments. We'll use psql here.
Fetching a blueprint Data Policy without global transforms
Before we create the global transform, first, let's see what is returned when we fetch the Data Policy created from the table when there are no global transforms defined.
In this data policy, no rule_sets section is present, which is correct, since there are no global transforms yet.
Note that the email field has a tag, pii-email. This has been extracted from the comment on the column in the PostgreSQL table.
Creating a global transform
Let's have a more detailed look at the global-tag-transforms.yaml:
global-tag-transforms.yaml
ref:"pii-email"description: "This is a global transform that nullifies all fields tagged with the value 'pii-email' for all other users, except the administrator and the fraud_and_risk members. In this example, this is a comment on the column of the 'email' field in the test dataset. Please see the 'data.sql' file for the comment on the field."
tag_transform:tag_content:"pii-email"transforms:# The administrator group can see all data - principals: [ { group:administrator } ]identity: { }# The fraud_and_risk group should see a part of the email - principals: [ { group:fraud_and_risk } ]regexp:regexp:"^.*(@.*)$"replacement:"****$1"# All other users should not see the email - principals: [ ]nullify: { }
So for any fields in the schema with the tag pii-email, this transform should be included. Next, create the global transform.
Feel free to list the global transforms to see whether it has been correctly created (pace list global-transforms)
Fetching a Data Policy with a rule set based on global transforms
When we fetch the Data Policy now, the global transform should be added to the rule_sets section of the data policy. Run the command to get a blueprint data policy for our table again.
In this data policy, a rule_sets section is present, and it has been populated with the global transforms, since the email field has a tag pii-email and the global transform should be added for fields that have that specific tag.
Cleanup
That wraps up the global transforms example. To clean up all resources, run the following command after stopping the currently running process with ctrl+C.
dockercomposedown
Any questions or comments? Please ask them on Slack.