Comment on page
Global Tag Transforms
Create Data Policies with blueprint transforms
This tutorial assumes that you have completed the quickstart section of the docs. The prerequisites for this tutorial are the same as mentioned there.
The goal of this tutorial is to be able to fetch a Data Policy with a ruleset included, based on tags that are attached to a field on the respective processing platform. This enables the end-user to define transforms once, and reuse them without leaving the data processing platform or data catalog.
Clone Repository
Manual setup
Clone the repository from GitHub, if you haven't already done so. This command assumes you're not using SSH, but feel free to do so.
git clone https://github.com/getstrm/pace.git
Create a directory
global-tag-transforms
with the following directory tree structure:global-tag-transforms
├── docker-compose.yaml
├── data.sql
├── config
│ └── application.yaml
└── global-tag-transform.yaml
Now navigate to the
global-tag-transforms
directory inside the pace
repo:cd pace/examples/global-tag-transforms
Next, let's have a look at the contents of these files.
The compose file is set up without any persistence of data across different startups of the services. Keep in mind that any changes to the data will be persisted for as long as you keep the services running.
Since PostgreSQL has no "native" support for tags on columns, we've come up with a syntax to allow specifying tags in comments on columns.
The compose file defines three services:
8080
-> Spring Boot Actuator.9090
-> Envoy JSON / gRPC REST Transcoding proxy.50051
-> gRPC.
- postgres_pace acts as the persistent layer for PACE to store its Data Policies.
- Available under
localhost:5432
on your machine.
- postgres_processing_platform is the pre-populated database.
- Available under
localhost:5431
on your machine.
The PostgreSQL initialization SQL script that is run on startup for the
postgres_processing_platform
container. The database is configured to use the public
schema (the default), with the following data:- A comment on the
email
field of thepublic.demo
table, that includes the tagpace::pii-email
.
This is the Spring Boot application configuration, which specifies the PACE database connection, and Processing Platform.
spring:
datasource:
url: jdbc:postgresql://postgres_pace:5432/pace
hikari:
username: pace
password: pace
schema: public
app:
processing-platforms:
postgres:
- id: "global_transforms-sample-connection"
host-name: "postgres_processing_platform"
port: 5432
user-name: "global_transforms"
password: "global_transforms"
database: "global_transforms"
Here, PACE is configured to connect to the
postgres_pace
host, which is the name of the Docker network that both the pace_app and the postgres_pace containers are configured with.Furthermore, one Processing Platform of type postgres is configured, named
global_transforms-sample-connection
.This is the Global Transform we'll be creating in this tutorial. It uses tags on the processing platform, which are attached to fields in the data schema, and therefore, is of type
tag_transform
.A global transform allows for specifying multiple transforms based on the principal of the data consumer, similar to how it's done when creating rule sets. In fact, this transform is translated into a rule set of its own, as we'll see later in the tutorial.
Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:
docker compose up --pull always
There should be quite a bit of logging, ending in the banner of the PACE app booting. Once everything has started, try to connect to the postgres_processing_platform to view the table and it's comments. We'll use
psql
here.Connect to the PostgreSQL database.
psql postgresql://global_transforms:global_transforms@localhost:5431/global_transforms
Next, view the table and the comments.
select column_name, data_type, col_description('public.demo'::regclass, ordinal_position) comment
from information_schema.columns
where table_schema = 'public' and table_name = 'demo';
This results in the following representation of the demo table. As you can see, the
pii-email
tag is already set in the comment of the field email
. column_name | data_type | comment
-------------------+-------------------+--------------------------------------------------------------------------
transactionid | integer |
userid | integer |
name | character varying |
email | character varying | This is a user email which should be considered as such. pace::pii-email
age | integer |
salary | integer |
postalcode | character varying |
brand | character varying |
transactionamount | integer |
(9 rows)
Before we create the global transform, first, let's see what is returned when we fetch the Data Policy created from the table when there are no global transforms defined.
pace get data-policy --blueprint --processing-platform global_transforms-sample-connection public.demo
Which returns the following data policy.
metadata:
description: ""
title: public.demo
platform:
id: global_transforms-sample-connection
platform_type: POSTGRES
source:
fields:
- name_parts:
- transactionid
required: true
type: integer
- name_parts:
- userid
required: true
type: integer
- name_parts:
- name
required: true
type: varchar
- name_parts:
- email
required: true
tags:
- pii-email
type: varchar
- name_parts:
- age
required: true
type: integer
- name_parts:
- salary
required: true
type: integer
- name_parts:
- postalcode
required: true
type: varchar
- name_parts:
- brand
required: true
type: varchar
- name_parts:
- transactionamount
required: true
type: integer
ref: public.demo
In this data policy, no
rule_sets
section is present, which is correct, since there are no global transforms yet.Note that the
email
field has a tag, pii-email
. This has been extracted from the comment on the column in the PostgreSQL table.Let's have a more detailed look at the
global-tag-transforms.yaml
:global-tag-transforms.yaml
ref: "pii-email"
description: "This is a global transform that nullifies all fields tagged with the value 'pii-email' for all other users, except the administrator and the fraud_and_risk members. In this example, this is a comment on the column of the 'email' field in the test dataset. Please see the 'data.sql' file for the comment on the field."
tag_transform:
tag_content: "pii-email"
transforms:
# The administrator group can see all data
- principals: [ { group: administrator } ]
identity: { }
# The fraud_and_risk group should see a part of the email
- principals: [ { group: fraud_and_risk } ]
regexp:
regexp: "^.*(@.*)$"
replacement: "****$1"
# All other users should not see the email
- principals: [ ]
nullify: { }
So for any fields in the schema with the tag
pii-email
, this transform should be included. Next, create the global transform.pace upsert global-transform global-tag-transform.yaml --apply
Feel free to list the global transforms to see whether it has been correctly created (
pace list global-transforms
)When we fetch the Data Policy now, the global transform should be added to the
rule_sets
section of the data policy. Run the command to get a blueprint data policy for our table again.pace get data-policy --processing-platform global_transforms-sample-connection public.demo
Which returns the following data policy.
metadata:
description: ""
title: public.demo
platform:
id: global_transforms-sample-connection
platform_type: POSTGRES
rule_sets:
- field_transforms:
- field:
name_parts:
- email
required: true
tags:
- pii-email
type: varchar
transforms:
- identity: {}
principals:
- group: administrator
- principals:
- group: fraud_and_risk
regexp:
regexp: ^.*(@.*)$
replacement: '****$1'
- nullify: {}
target:
fullname: public.demo_pace_view
source:
fields:
- name_parts:
- transactionid
required: true
type: integer
- name_parts:
- userid
required: true
type: integer
- name_parts:
- name
required: true
type: varchar
- name_parts:
- email
required: true
tags:
- pii-email
type: varchar
- name_parts:
- age
required: true
type: integer
- name_parts:
- salary
required: true
type: integer
- name_parts:
- postalcode
required: true
type: varchar
- name_parts:
- brand
required: true
type: varchar
- name_parts:
- transactionamount
required: true
type: integer
ref: public.demo
In this data policy, a
rule_sets
section is present, and it has been populated with the global transforms, since the email
field has a tag pii-email
and the global transform should be added for fields that have that specific tag.That wraps up the global transforms example. To clean up all resources, run the following command after stopping the currently running process with
ctrl+C
.docker compose down
Last modified 14h ago