Quickstart

Set up the standalone example

This document helps you to set up a standalone example for PACE. This includes:

  • pace_app: PACE application running as a container image

  • postgres_pace: PostgreSQL database for PACE to persist Data Policies

  • postgres_processing_platform: Another PostgreSQL database for PACE to connect to as a Processing Platform This database is pre-populated with data to give a good overview of what PACE is able to do for you or your organization.

The goal of this tutorial is to run the standalone example, create a Data Policy based on the data schema of the sample data table, and view the data as the different users that we will define.

The source of the standalone example can be found in the GitHub repository.

Make sure you have everything setup according to the installation steps.

Prerequisites

Before you get started, make sure you've installed the following tools:

File and directory setup

Clone the repository from GitHub. This command assumes you're not using SSH, but feel free to do so.

git clone https://github.com/getstrm/pace.git

Now navigate to the standalone directory inside the newly create pace folder:

cd pace/examples/standalone

Next, let's have a look at the contents of these files.

The compose file is set up without any persistence of data across different startups of the services. Keep in mind that any changes to the data will be persisted for as long as you keep the services running.

docker-compose.yaml

The compose file contains three services, matching the introduction section of this document:

  • pace_app with all ports exposed to the host for all different interfaces (REST, gRPC, and directly to the ):

    • 8080 -> Spring Boot Actuator

    • 9090 -> Envoy JSON / gRPC Transcoding proxy

    • 50051 -> gRPC

  • postgres_pace acts as the persistent layer for PACE to store its Data Policies

    • Available under localhost:5432 on your machine.

  • postgres_processing_platform is the pre-populated database

    • Available under localhost:5431 on your machine.

data.sql

The PostgreSQL initialization SQL script that is run on startup for the postgres_processing_platform container. The database is configured to use the public schema (the default), with the following data:

  • A table called public.demo, for the data schema, please see the file contents.

  • Several users:

    • standalone (password = standalone) - this is the user we're using to connect PACE to PostgreSQL as a Processing Platform

    • mark (password = mark)

    • far (password = far)

    • other (password = other)

  • Several groups (and their users):

    • administrator: standalone

    • marketing: mark

    • fraud_and_risk: far

The idea here is that standalone is the super user that should be able to see all data in its raw form, and that users mark and far should only see the view that is created when creating a Data Policy in PACE.

data-policy.yaml

We'll look at the Data Policy contents at a later stage in this quickstart. Feel free to already take a peek, the policy contains:

  1. Source ref The reference to the Processing Platform (both the type and id). Note: the id is a self assigned identifier, that can be found in config/application.yaml. For this quickstart, the id is standalone-sample-connection. The ref also specifies the fully qualified name (integration_fqn) of the source table to which this policy applies.

  2. Data schema Shape of the data, all fields and their data type.

  3. Rule set Rules defining what the view that will be created by PACE will look like, which data will be included and how the data will be presented to different users.

config/application.yaml

This is the Spring Boot application configuration, which allows for configuring the PACE database, and for configuring Data Catalog and Processing Platform connections.

spring:
  datasource:
    url: jdbc:postgresql://postgres_pace:5432/pace
    hikari:
      username: pace
      password: pace
      schema: public

app:
  processing-platforms:
    postgres:
      - id: "standalone-sample-connection"
        host-name: "postgres_processing_platform"
        port: 5432
        user-name: "standalone"
        password: "standalone"
        database: "standalone"

Here, PACE is configured to connect to the postgres_pace host, which is the name of the Docker network that both the pace_app and the postgres_pace containers are configured with.

Furthermore, one Processing Platform of type postgres is configured, named standalone-sample-connection.

Running the standalone example

Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:

docker compose up

There should be quite a bit of logging, ending in the banner of the PACE app booting. Once everything has started, try to connect to the postgres_processing_platform to view the pre-populated data. We'll use psql here.

Tutorial video

In this video version, our engineer Ivan walks you through the steps below. Feel free to watch it and follow along:

Viewing the sample data

As standalone

In a new terminal window run:

psql postgresql://standalone:standalone@localhost:5431/standalone

List the available tables.

standalone=# \dt
            List of relations
 Schema | Name | Type  |      Owner
--------+------+-------+-----------------
 public | demo | table | standalone
(1 row)

Next, query some of the sample data.

standalone=# select * from public.demo limit 3;
 transactionid | userid |           email           | age |  brand  | transactionamount
---------------+--------+---------------------------+-----+---------+-------------------
     861200791 | 533445 | jeffreypowell@hotmail.com |  33 | Lenovo  |               123
     733970993 | 468355 | forbeserik@gmail.com      |  16 | MacBook |                46
     494723158 | 553892 | wboone@gmail.com          |  64 | Lenovo  |                73
(3 rows)

This is the raw data, which we're able to see, as we're connected to the PostgreSQL database as the standalone.

As a different user

When connecting to the database using either mark or far, we won't be able to see the raw data, as the grants have not been configured that way. Either exit current psql session by pressing ctrl+D or open a new terminal window.

psql postgresql://mark:mark@localhost:5431/standalone

The user will be able to see that the table exists, but when querying the data, the user will get:

standalone=> select * from public.demo limit 3;
ERROR:  permission denied for table demo

Using the CLI

By default, the CLI connects to localhost:50051, as it uses the gRPC interface of PACE. Let's see whether we can list the available groups.

Tip: set up the CLI and try out the autocomplete on arguments and flags, e.g. --processing-platform <tab> to see the available options for your PACE deployment.

Run the following commands in a new terminal window or tab.

pace list groups --processing-platform standalone-sample-connection

Which results in the following groups, that have been configured by the data.sql init script:

groups:
- administrator
- fraud_and_risk
- marketing
page_info:
  total: 3

Try to list the available tables.

pace list tables --processing-platform standalone-sample-connection \
    --database standalone --schema public

Which results in the following table:

page_info:
  total: 1
tables:
- id: demo
  name: demo
  schema:
    id: public
    database:
      id: standalone

The Data Policy file

Create a blueprint policy

We start with a blueprint policy (without any rule sets) by reading the description of a table on the processing platform.

pace get data-policy demo --blueprint \
--processing-platform standalone-sample-connection \
--database standalone \
--schema public

This results in the following YAML.

metadata:
  description: ""
  title: public.demo
source:
  fields:
  - name_parts:
    - transactionid
    required: true
    type: integer
  - name_parts:
    - userid
    required: true
    type: integer
  - name_parts:
    - email
    required: true
    type: varchar
  - name_parts:
    - age
    required: true
    type: integer
  - name_parts:
    - brand
    required: true
    type: varchar
  - name_parts:
    - transactionamount
    required: true
    type: integer
  ref:
    integration_fqn: public.demo
    platform:
      id: standalone-sample-connection
      platform_type: POSTGRES

The only thing missing here, is a rule_sets section, that defines how the PostgreSQL view should behave that PACE will create.

Create a ruleset

It's possible to define multiple rulesets for a single source table, where each ruleset results in a separate view. For this quickstart, we'll stick with a single ruleset. We won't discuss every filter or transform, but we'll discuss a few.

Filters

For the filters, the goals are:

  • Users in the administrator group should always see the complete data

  • Users in the fraud_and_risk group should always see the complete data

  • Any other users should only see data where the age field has a value greater than 8.

In YAML, this would look as follows:

filters:
  - generic_filter:
      conditions:
        - principals: [ { group: administrator }, { group: fraud_and_risk } ]
          condition: "true"
        - principals: [ ]
          condition: "age > 8"

Transforms

In the final data-policy.yaml, there are quite a bit of field transforms defined. We'll discuss the transforms for the field email, as that's the most complex one. The goals for the email field are:

  • Users in the administrator group should see the raw value

  • Users in the marketing group should only see the domain of the email, everything before the @ should be redacted.

  • Users in the fraud_and_risk group should see the raw value

  • Any other users should see the entire email as a redacted value

In YAML, this would look as follows:

field_transforms:
  - field:
      name_parts: [ email ]
    transforms:
      - principals: [ { group: administrator } ]
        identity: { }
      - principals: [ { group: marketing } ]
        regexp:
          regexp: "^.*(@.*)$"
          replacement: "****$1"
      - principals: [ { group: fraud_and_risk } ]
        identity: { }
      - principals: [ ]
        fixed:
          value: "****"

Ordering of transforms is important! The first match determines the behavior of the transform for the respective user. Imagine a user being both in the marketing and fraud_and_risk group. Even though the user is in both, the marketing transform has a higher precedence, hence that will be used for presenting the email field value.

Final data-policy.yaml

For the final Data Policy, please have a look at the file here. This is the file that is used when creating the Data Policy.

Creating the Data Policy

CLI

Make sure your current working directory is the same as where the data-policy.yaml file resides. Next, run:

pace upsert data-policy data-policy.yaml --apply

This will create a view with the name public.demo_view in the postgres_processing_platform.

Explore the data

Click through the tabs below to see the data as the different users created in this quickstart.

Connect to the database as standalone.

psql postgresql://standalone:standalone@localhost:5431/standalone

Query the view:

standalone=> select * from public.demo_view limit 3;
 transactionid | userid |           email           | age |  brand  | transactionamount
---------------+--------+---------------------------+-----+---------+-------------------
     861200791 | 533445 | jeffreypowell@hotmail.com |  33 | Lenovo  |               123
     733970993 | 468355 | forbeserik@gmail.com      |  16 | MacBook |                46
     494723158 | 553892 | wboone@gmail.com          |  64 | Lenovo  |                73
(3 rows)

As you can see, this matches data of the raw table exactly, as presented above in this document.

Cleanup

That wraps up the standalone example. To clean up all resources, run the following command after stopping the currently running process with ctrl+C.

docker compose down

Any questions or comments? Please ask them on Slack.

Last updated