This document helps you to set up a standalone example for PACE. This includes:
pace_app: PACE application running as a container image
postgres_pace: PostgreSQL database for PACE to persist Data Policies
postgres_processing_platform: Another PostgreSQL database for PACE to connect to as a Processing Platform
This database is pre-populated with data to give a good overview of what PACE is able to do for you or your organization.
The goal of this tutorial is to run the standalone example, create a Data Policy based on the data schema of the sample data table, and view the data as the different users that we will define.
The source of the standalone example can be found in the GitHub repository.
Make sure you have everything setup according to the installation steps.
Prerequisites
Before you get started, make sure you've installed the following tools:
Now navigate to the standalone directory inside the newly create pace folder:
cdpace/examples/standalone
Next, let's have a look at the contents of these files.
The compose file is set up without any persistence of data across different startups of the services. Keep in mind that any changes to the data will be persisted for as long as you keep the services running.
docker-compose.yaml
The compose file contains three services, matching the introduction section of this document:
pace_app with all ports exposed to the host for all different interfaces (REST, gRPC, and directly to the ):
8080 -> Spring Boot Actuator
9090 -> Envoy JSON / gRPC Transcoding proxy
50051 -> gRPC
postgres_pace acts as the persistent layer for PACE to store its Data Policies
Available under localhost:5432 on your machine.
postgres_processing_platform is the pre-populated database
Available under localhost:5431 on your machine.
data.sql
The PostgreSQL initialization SQL script that is run on startup for the postgres_processing_platform container. The database is configured to use the public schema (the default), with the following data:
A table called public.demo, for the data schema, please see the file contents.
Several users:
standalone (password = standalone) - this is the user we're using to connect PACE to PostgreSQL as a Processing Platform
mark (password = mark)
far (password = far)
other (password = other)
Several groups (and their users):
administrator: standalone
marketing: mark
fraud_and_risk: far
The idea here is that standalone is the super user that should be able to see all data in its raw form, and that users mark and far should only see the view that is created when creating a Data Policy in PACE.
data-policy.yaml
We'll look at the Data Policy contents at a later stage in this quickstart. Feel free to already take a peek, the policy contains:
Source ref
The reference to the Processing Platform (both the type and id). Note: the id is a self assigned identifier, that can be found in config/application.yaml. For this quickstart, the id is standalone-sample-connection. The ref also specifies the fully qualified name (integration_fqn) of the source table to which this policy applies.
Data schema
Shape of the data, all fields and their data type.
Rule set
Rules defining what the view that will be created by PACE will look like, which data will be included and how the data will be presented to different users.
config/application.yaml
This is the Spring Boot application configuration, which allows for configuring the PACE database, and for configuring Data Catalog and Processing Platform connections.
Here, PACE is configured to connect to the postgres_pace host, which is the name of the Docker network that both the pace_app and the postgres_pace containers are configured with.
Furthermore, one Processing Platform of type postgres is configured, named standalone-sample-connection.
Running the standalone example
Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:
dockercomposeup
There should be quite a bit of logging, ending in the banner of the PACE app booting. Once everything has started, try to connect to the postgres_processing_platform to view the pre-populated data. We'll use psql here.
Tutorial video
In this video version, our engineer Ivan walks you through the steps below. Feel free to watch it and follow along:
standalone=# select * from public.demo limit 3;transactionid|userid|email|age|brand|transactionamount---------------+--------+---------------------------+-----+---------+-------------------861200791|533445|jeffreypowell@hotmail.com|33|Lenovo|123733970993|468355|forbeserik@gmail.com|16|MacBook|46494723158|553892|wboone@gmail.com|64|Lenovo|73(3rows)
This is the raw data, which we're able to see, as we're connected to the PostgreSQL database as the standalone.
As a different user
When connecting to the database using either mark or far, we won't be able to see the raw data, as the grants have not been configured that way. Either exit current psql session by pressing ctrl+D or open a new terminal window.
By default, the CLI connects to localhost:50051, as it uses the gRPC interface of PACE. Let's see whether we can list the available groups.
Tip: set up the CLI and try out the autocomplete on arguments and flags, e.g. --processing-platform <tab> to see the available options for your PACE deployment.
Run the following commands in a new terminal window or tab.
The only thing missing here, is a rule_sets section, that defines how the PostgreSQL view should behave that PACE will create.
Create a ruleset
It's possible to define multiple rulesets for a single source table, where each ruleset results in a separate view. For this quickstart, we'll stick with a single ruleset. We won't discuss every filter or transform, but we'll discuss a few.
Filters
For the filters, the goals are:
Users in the administrator group should always see the complete data
Users in the fraud_and_risk group should always see the complete data
Any other users should only see data where the age field has a value greater than 8.
In the final data-policy.yaml, there are quite a bit of field transforms defined. We'll discuss the transforms for the field email, as that's the most complex one. The goals for the email field are:
Users in the administrator group should see the raw value
Users in the marketing group should only see the domain of the email, everything before the @ should be redacted.
Users in the fraud_and_risk group should see the raw value
Any other users should see the entire email as a redacted value
Ordering of transforms is important! The first match determines the behavior of the transform for the respective user. Imagine a user being both in the marketing and fraud_and_risk group. Even though the user is in both, the marketing transform has a higher precedence, hence that will be used for presenting the email field value.
Final data-policy.yaml
For the final Data Policy, please have a look at the file here. This is the file that is used when creating the Data Policy.
Creating the Data Policy
CLI
Make sure your current working directory is the same as where the data-policy.yaml file resides. Next, run:
paceupsertdata-policydata-policy.yaml--apply
This will create a view with the name public.demo_view in the postgres_processing_platform.
Explore the data
Click through the tabs below to see the data as the different users created in this quickstart.