Comment on page
Create a Data Policy
Complete walkthrough of creating a Data Policy
For this section, we assume you have either created a connection to a
Processing Platform
or a Data Catalog
or you are familiar with the structure of your source data. You will, naturally, also need to have an instance of PACE
running. We give a step-by-step walkthrough on how to create a Data Policy
.For this example we will use a very small sample data set:
email | age |
---|---|
16 | |
18 | |
20 |
Let the id of your connected
Processing Platform
be pace-pp
and the source reference be pace-db.pace-schema. pace-table
. Getting a blueprint policy yields a nearly empty Data Policy
with populated source fields, thus defining the structure of your source data. In this example we will use both the pace cli
and call the [REST API](../reference/api-reference. md#processing-platforms-platformid-tables-table_id-blueprint-policy) using curl
. The request has two variables, platform-id
and table-id
. The CLI makes use of gRPC. We assume here that the instance is running on localhost, the gRPC port is 50051 and the envoy proxy is listening on port 9090 as per the defaults. For the CLI holds, that if you specify the processing platform and the reference to the table, the corresponding blueprint policy is returned. Requesting a blueprint policy then yields:CLI
curl
pace get data-policy --blueprint -p pace-pp pace-db.pace-schema.pace-table
curl localhost:9090/processing-platforms/pace-pp/tables/pace-db.pace-schema.pace-table/blueprint-policy
The resulting blueprint policy:
YAML
JSON
1
data_policy:
2
id: ""
3
metadata:
4
title: pace-db.pace-schema.pace-table
5
description: ""
6
version: ""
7
create_time: 2023-11-03T12:00:00.000Z
8
update_time: 2023-11-03T12:00:00.000Z
9
tags: []
10
source:
11
ref: pace-db.pace-schema.pace-table
12
fields:
13
- name_parts:
14
- email
15
type: STRING
16
required: false
17
tags: []
18
- name_parts:
19
- age
20
type: INTEGER
21
required: false
22
tags: []
23
tags: []
24
platform:
25
platform_type: <PLATFORM_TYPE>
26
id: pace-pp
27
rule_sets: []
1
{
2
"data_policy": {
3
"id": "",
4
"metadata": {
5
"title": "pace-db.pace-schema.pace-table",
6
"description": "",
7
"version": "",
8
"create_time": "2023-11-03T12:00:00.000Z",
9
"update_time": "2023-11-03T12:00:00.000Z",
10
"tags": []
11
},
12
"source": {
13
"ref": "pace-db.pace-schema.pace-table",
14
"fields": [
15
{
16
"name_parts": [
17
"email"
18
],
19
"type": "STRING",
20
"required": false,
21
"tags": []
22
},
23
{
24
"name_parts": [
25
"age"
26
],
27
"type": "INTEGER",
28
"required": false,
29
"tags": []
30
}
31
],
32
"tags": []
33
},
34
"platform": {
35
"platform_type": "<PLATFORM_TYPE>",
36
"id": "pace-pp"
37
},
38
"rule_sets": []
39
}
40
}
As you can see, the
Rule Sets
are empty and only the refs and source.fields
have been populated. This dataset consists of 2 columns: email
, age
. Now that we know our data's structure, the next step is populating the yaml with a Rule Set
.If you do not have a
Processing Platform
or Data Catalog
connected yet, use the blueprint policy from above to define the structure of your data by hand.In this example we will show one
Rule Set
, with one Field Transform
and one Filter
. If you define multiple Rule Sets
in your data policy, multiple views will be created. For a more extensive explanation on Rule Sets
, we provide more in-depth documentation on Principals
, Field Transforms
and Filters
.Let's start by defining the reference to the target table. We have chosen the target schema to be the same as the source schema. The
Target
should then be defined asYAML
JSON
1
rule_sets:
2
- target:
3
fullname: "pace-db.pace-schema.pace-view"
1
{
2
"rule_sets": [
3
{
4
"target": {
5
"fullname": "pace-db.pace-schema.pace-view"
6
}
7
}
8
]
9
}
We define one
Field Transform
and add it to the Rule Set
. Our transform concerns the email
field. The field
definition is corresponding to the same field in the source fields. We define a Transform
for the Marketing (MKTNG
) principal and one for the Fraud and Risk (F&R
) principal. For the Marketing principal the local-part of the email is replaced by ****
while leaving the @
and the domain as-is, using a regular expression with replacement. For the Fraud and Risk principal we apply an identity
transform, returning the email as-is. Finally, if the viewer is not a member of either of these Principals, a fixed value ****
is returned instead of the email. More guidance and examples on how to define Transforms
see the docs.YAML
JSON
1
- field:
2
name_parts:
3
- email
4
type: "string"
5
required: true
6
tags: []
7
transforms:
8
- principals:
9
- group: "MKTNG"
10
regexp:
11
regexp: "^.*(@.*)$"
12
replacement: "****$1"
13
- principals:
14
- group: "F&R"
15
identity: {}
16
- principals: []
17
fixed:
18
value: "****"
1
[
2
{
3
"field": {
4
"name_parts": [
5
"email"
6
],
7
"type": "string",
8
"required": true,
9
"tags": []
10
},
11
"transforms": [
12
{
13
"principals": [
14
{
15
"group": "MKTNG"
16
}
17
],
18
"regexp": {
19
"regexp": "^.*(@.*)$",
20
"replacement": "****$1"
21
}
22
},
23
{
24
"principals": [
25
{
26
"group": "F&R"
27
}
28
],
29
"identity": {}
30
},
31
{
32
"principals": [],
33
"fixed": {
34
"value": "****"
35
}
36
}
37
]
38
}
39
]
To completely filter out rows, we here define one
Filter
based on the age
field. Note that the condition can contain any arbitrary logic with any number of fields. For the Fraud and Risk principal we return all rows whereas for all other principals we exclude all children from the target view. More details on Filters
can be found here.YAML
JSON
1
filters:
2
- conditions:
3
- principals:
4
- group: "F&R"
5
condition: "true"
6
- principals: []
7
condition: "age > 18"
1
{
2
"filters": [
3
{
4
"conditions": [
5
{
6
"principals": [
7
{
8
"group": "F&R"
9
}
10
],
11
"condition": "true"
12
},
13
{
14
"principals": [],
15
"condition": "age > 18"
16
}
17
]
18
}
19
]
20
}
Putting it all together in one
Rule Set
:YAML
JSON
1
rule_sets:
2
- target:
3
fullname: "pace-db.pace-schema.pace-view"
4
field_transforms:
5
- field:
6
name_parts:
7
- email
8
type: "string"
9
required: true
10
tags: []
11
transforms:
12
- principals:
13
- group: "MKTING"
14
regexp:
15
regexp: "^.*(@.*)$"
16
replacement: "****$1"
17
- principals:
18
- group: "F&R"
19
identity: {}
20
- principals: []
21
fixed:
22
value: "****"
23
filters:
24
- conditions:
25
- principals:
26
- group: "F&R"
27
condition: "true"
28
- principals: []
29
condition: "age > 18"
1
{
2
"rule_sets": [
3
{
4
"target": {
5
"fullname": "pace-db.pace-schema.pace-view"
6
},
7
"field_transforms": [
8
{
9
"field": {
10
"name_parts": [
11
"email"
12
],
13
"type": "string",
14
"required": true,
15
"tags": []
16
},
17
"transforms": [
18
{
19
"principals": [
20
{
21
"group": "MKTING"
22
}
23
],
24
"regexp": {
25
"regexp": "^.*(@.*)$",
26
"replacement": "****$1"
27
}
28
},
29
{
30
"principals": [
31
{
32
"group": "F&R"
33
}
34
],
35
"identity": {}
36
},
37
{
38
"principals": [],
39
"fixed": {
40
"value": "****"
41
}
42
}
43
]
44
}
45
],
46
"filters": [
47
{
48
"conditions": [
49
{
50
"principals": [
51
{
52
"group": "F&R"
53
}
54
],
55
"condition": "true"
56
},
57
{
58
"principals": [],
59
"condition": "age > 18"
60
}
61
]
62
}
63
]
64
}
65
]
66
}
Below you will find the resulting Data Policy.
YAML
JSON
data_policy.yaml
1
data_policy:
2
id: ""
3
metadata:
4
title: pace-db.pace-schema.pace-table
5
description: ""
6
version: ""
7
create_time: 2023-11-03T12:00:00.000Z
8
update_time: 2023-11-03T12:00:00.000Z
9
tags: []
10
source:
11
ref: pace-db.pace-schema.pace-table
12
fields:
13
- name_parts:
14
- email
15
type: STRING
16
required: false
17
tags: []
18
- name_parts:
19
- age
20
type: INTEGER
21
required: false
22
tags: []
23
tags: []
24
platform:
25
platform_type: <PLATFORM_TYPE>
26
id: pace-pp
27
rule_sets:
28
- target:
29
fullname: "pace-db.pace-schema.pace-view"
30
field_transforms:
31
- field:
32
name_parts:
33
- email
34
type: "string"
35
required: true
36
tags: []
37
transforms:
38
- principals:
39
- group: "MKTING"
40
regexp:
41
regexp: "^.*(@.*)$"
42
replacement: "****$1"
43
- principals:
44
- group: "F&R"
45
identity: {}
46
- principals: []
47
fixed:
48
value: "****"
49
filters:
50
- conditions:
51
- principals:
52
- group: "F&R"
53
condition: "true"
54
- principals: []
55
condition: "age > 18"
data_policy.json
1
{
2
"data_policy": {
3
"id": "",
4
"metadata": {
5
"title": "pace-db.pace-schema.pace-table",
6
"description": "",
7
"version": "",
8
"create_time": {
9
"nanos": 0,
10
"seconds": 1699012800
11
},
12
"update_time": {
13
"nanos": 0,
14
"seconds": 1699012800
15
}
16
},
17
"source": {
18
"ref": "pace-db.pace-schema.pace-table",
19
"fields": [
20
{
21
"name_parts": [
22
"email"
23
],
24
"type": "STRING",
25
"required": false,
26
"tags": []
27
},
28
{
29
"name_parts": [
30
"age"
31
],
32
"type": "INTEGER",
33
"required": false,
34
"tags": []
35
}
36
],
37
"tags": []
38
},
39
"platform": {
40
"platform_type": "<PROCESSING-PLATFORM-TYPE>",
41
"id": "pace-pp"
42
},
43
"rule_sets": [
44
{
45
"target": {
46
"fullname": "pace-db.pace-schema.pace-view"
47
},
48
"field_transforms": [
49
{
50
"field": {
51
"name_parts": [
52
"email"
53
],
54
"type": "string",
55
"required": true,
56
"tags": []
57
},
58
"transforms": [
59
{
60
"principals": [
61
{
62
"group": "MKTNG"
63
}
64
],
65
"regexp": {
66
"regexp": "^.*(@.*)$",
67
"replacement": "****$1"
68
}
69
},
70
{
71
"principals": [
72
{
73
"group": "F&R"
74
}
75
],
76
"identity": {}
77
},
78
{
79
"principals": [],
80
"fixed": {
81
"value": "****"
82
}
83
}
84
]
85
}
86
],
87
"filters": [
88
{
89
"conditions": [
90
{
91
"principals": [
92
{
93
"group": "F&R"
94
}
95
],
96
"condition": "true"
97
},
98
{
99
"principals": [],
100
"condition": "age > 18"
101
}
102
]
103
}
104
]
105
}
106
]
107
}
108
}
Assuming we have saved the policy as
data_policy.json
or data_policy.yaml
in your current working directory, we can upsert the Data Policy. The CLI accepts both YAML and JSON, curl only JSON:CLI
curl
pace upsert data-policy ./data_policy.yaml --apply
curl -X POST -H "Content-Type: application/json" -d @./data_policy.json localhost:9090/data-policies
By adding the
--apply
flag, the Data Policy is applied immediately, and so is therefore the corresponding SQL view. It is possible to first upsert and only later apply a policy, in which case the ID and platform of the upserted policy must be provided:pace apply data-policy your-data-policy-id -p your-platform-id
Depending on what
Principal
groups you are in, you will find that the actual data you have access to via the newly created view differs. Below you will find again the raw data and the views applied for several Principals
.RAW Data
email | age |
---|---|
16 | |
18 | |
20 |
[ MKTNG ]
[ F&R ]
[ CMPLNCE ]
email | age |
---|---|
****@store.io | 20 |
email | age |
---|---|
16 | |
18 | |
20 |
email | age |
---|---|
**** | 16 |
Last modified 14h ago