Create a Data Policy

Complete walkthrough of creating a Data Policy

Introduction

For this section, we assume you have either created a connection to a Processing Platform or a Data Catalog or you are familiar with the structure of your source data. You will, naturally, also need to have an instance of PACE running. We give a step-by-step walkthrough on how to create a Data Policy.

Please refer to the Schema, Principals and Rule Set sections for additional explanations.

Source Data

For this example we will use a very small sample data set:

email
age

alice@domain.com

16

bob@company.org

18

charlie@store.io

20

Get blueprint policy

Let the id of your connected Processing Platform be pace-platform and the source reference be pace-db.pace-schema. pace-table. Getting a blueprint policy yields a nearly empty Data Policy with populated source fields, thus defining the structure of your source data. In this example we will use both the pace cli and call the REST API using curl. The request has two variables, platform-id and table-id. The CLI makes use of gRPC. We assume here that the instance is running on localhost, the gRPC port is 50051 and the envoy proxy is listening on port 9090 as per the defaults. For the CLI holds, that if you specify the processing platform and the reference to the table, the corresponding blueprint policy is returned. Requesting a blueprint policy then yields:

pace get data-policy --blueprint -p pace-platform \
    --database pace-db --schema pace-schema pace-table

The resulting blueprint policy:

data_policy:
  id: ""
  metadata:
    title: pace-db.pace-schema.pace-table
    description: ""
    version: ""
    create_time: 2023-11-03T12:00:00.000Z
    update_time: 2023-11-03T12:00:00.000Z
    tags: []
  source:
    ref: 
      integration_fqn: pace-db.pace-schema.pace-table
      platform:
        platform_type: <PLATFORM_TYPE>
        id: pace-platform
    fields:
      - name_parts:
          - email
        type: STRING
        required: false
        tags: []
      - name_parts:
          - age
        type: INTEGER
        required: false
        tags: []
    tags: []
  rule_sets: []

As you can see, the Rule Sets are empty and only the refs and source.fields have been populated. This dataset consists of 2 columns: email, age. Now that we know our data's structure, the next step is populating the yaml with a Rule Set.

If you do not have a Processing Platform or Data Catalog connected yet, use the blueprint policy from above to define the structure of your data by hand.

Define Rule Sets

In this example we will show one Rule Set, with one Field Transform and one Filter. If you define multiple Rule Sets in your data policy, multiple views will be created. For a more extensive explanation on Rule Sets, we provide more in-depth documentation on Principals, Field Transforms and Filters.

Target

Let's start by defining the reference to the target table. We have chosen the target schema to be the same as the source schema. The Target should then be defined as

rule_sets:
  - target:
      ref:
        integration_fqn: "pace-db.pace-schema.pace-view"

Field Transform

We define one Field Transform and add it to the Rule Set. Our transform concerns the email field. The field definition is corresponding to the same field in the source fields. We define a Transform for the Marketing (MKTNG) principal and one for the Fraud and Risk (F&R) principal. For the Marketing principal the local-part of the email is replaced by **** while leaving the @ and the domain as-is, using a regular expression with replacement. For the Fraud and Risk principal we apply an identity transform, returning the email as-is. Finally, if the viewer is not a member of either of these Principals, a fixed value **** is returned instead of the email. More guidance and examples on how to define Transforms see the docs.

- field:
    name_parts: 
      - email
  transforms:
    - principals: 
        - group: "MKTNG"
      regexp:
        regexp: "^.*(@.*)$"
        replacement: "****$1"
    - principals:
      - group: "F&R"
      identity: {}
    - principals: []
      fixed:
        value: "****"

Filter

To completely filter out rows, we here define one Filter based on the age field. Note that the condition can contain any arbitrary logic with any number of fields. For the Fraud and Risk principal we return all rows whereas for all other principals we exclude all children from the target view. More details on Filters can be found here.

filters:
  - generic_filter:
      conditions:
      - principals:
        - group: "F&R"
        condition: "true"
      - principals: []
        condition: "age > 18"

Rule Set

Putting it all together in one Rule Set:

rule_sets:
  - target:
      ref:
        integration_fqn: "pace-db.pace-schema.pace-view"
    field_transforms:
      - field:
          name_parts: 
            - email
          type: "string"
          required: true
          tags: []
        transforms:
          - principals:
            - group: "MKTING"
            regexp:
              regexp: "^.*(@.*)$"
              replacement: "****$1"
          - principals:
            - group: "F&R"
            identity: {}
          - principals: []
            fixed:
              value: "****"
    filters:
      - generic_filter:
        - conditions:
          - principals:
            - group: "F&R"
            condition: "true"
          - principals: []
            condition: "age > 18"

Upsert the Data Policy

Below you will find the resulting Data Policy.

data_policy.yaml
metadata:
  description: ""
  version: 1
  title: public.demo
source:
  fields:
    - name_parts:
        - transactionid
      required: true
      type: integer
    - name_parts:
        - userid
      required: true
      type: integer
    - name_parts:
        - email
      required: true
      type: varchar
    - name_parts:
        - age
      required: true
      type: integer
    - name_parts:
        - brand
      required: true
      type: varchar
    - name_parts:
        - transactionamount
      required: true
      type: integer
  ref:
    integration_fqn: public.demo
    platform:
      id: standalone-sample-connection
      platform_type: POSTGRES
rule_sets:
  - target:
      ref:
        integration_fqn: "pace-db.pace-schema.pace-view"
    field_transforms:
      - field:
          name_parts: 
            - email
          type: "string"
          required: true
          tags: []
        transforms:
          - principals:
            - group: "MKTING"
            regexp:
              regexp: "^.*(@.*)$"
              replacement: "****$1"
          - principals:
            - group: "F&R"
            identity: {}
          - principals: []
            fixed:
              value: "****"
    filters:
      - generic_filter:
        - conditions:
          - principals:
            - group: "F&R"
            condition: "true"
          - principals: []
            condition: "age > 18"

Assuming we have saved the policy as data_policy.json or data_policy.yaml in your current working directory, we can upsert the Data Policy. The CLI accepts both YAML and JSON, curl only JSON:

pace upsert data-policy ./data_policy.yaml --apply

By adding the --apply flag, the Data Policy is applied immediately, and so is therefore the corresponding SQL view. It is possible to first upsert and only later apply a policy, in which case the ID and platform of the upserted policy must be provided:

pace apply data-policy your-data-policy-id -p your-platform-id

View data

Depending on what Principal groups you are in, you will find that the actual data you have access to via the newly created view differs. Below you will find again the raw data and the views applied for several Principals.

email
age

alice@domain.com

16

bob@company.org

18

charlie@store.io

20
email
age

****@store.io

20

For more examples and explanation visit the Rule Set Documentation.

Last updated