Create Data Policies that detokenize previously tokenized data
This tutorial assumes that you have completed the quickstart section of the docs. The prerequisites for this tutorial are the same as mentioned there.
The goal of this tutorial is to create a Data Policy that detokenizes previously tokenized data. Tokenization is a powerful data security method, especially when tokens are random, such as UUIDs. With PACE's Detokenize transform, we can implement data policies that provide the original values of tokenized data to authorized data consumers.
Context
We approach this tutorial from the perspective of a use case of an imaginary global payment card provider, which we'll name PCP.
PCP has a strong data-driven culture with by-default access to transaction data for every employee. To comply with upcoming stricter European legislation, they need to take measures to restrict data usage of especially European card holders. Their global fraud and risk team should however still be able to access the data they need to ensure safe and proper usage of PCP's services, most notably in case of reasonable suspicions of fraud.
Let's have a look at a sample of their transaction data:
As part of their measures, PCP decided to apply tokenization to card numbers across their data landscape. Each card number is assigned a random but unique UUID as token. The transaction data is stored in the above tokenized format.
To allow (data) teams to create value from this data, while complying with the European legislation, they decide to put in place a policy that guarantees the following:
European transactions should be filtered out for all users, except fraud and risk employees.
Card holder names should be hidden for all users, except fraud investigators.
Card numbers may be detokenized for fraud investigation.
Transaction IDs should be partially masked for all users, except fraud and risk employees.
Let's see how we can enforce this policy with PACE.
File and directory setup
Clone the repository from GitHub, if you haven't already done so. This command assumes you're not using SSH, but feel free to do so.
gitclonehttps://github.com/getstrm/pace.git
Create a directory detokenization with the following directory tree structure:
Now navigate to the detokenization directory inside the pace repo:
cdpace/examples/detokenization
Next, let's have a look at the contents of these files.
The compose file is set up without any persistence of data across different startups of the services. Keep in mind that any changes to the data will be persisted for as long as you keep the services running.
docker-compose.yaml
The compose file defines three services:
pace_app with the ports for all different interfaces exposed to the host:
9090 -> Envoy JSON / gRPC REST Transcoding proxy.
50051 -> gRPC.
8080 -> Spring Boot Actuator.
postgres_pace acts as the persistent layer for PACE to store its Data Policies.
Available under localhost:5432 on your machine.
postgres_processing_platform is the pre-populated database.
Available under localhost:5431 on your machine.
data.sql
The PostgreSQL initialization SQL script that is run on startup for the postgres_processing_platform container. The database is configured to use the public schema (the default), with the following data:
A public.transactions table, containing "transactions" with tokenized card numbers.
A public.tokens table, containing the corresponding card number tokens and their respective original values.
The script also creates a few roles and corresponding users, which we will see later on.
config/application.yaml
This is the Spring Boot application configuration, which specifies the PACE database connection, and Processing Platform.
Here, PACE is configured to connect to the postgres_pace host, which is the name of the Docker network that both the pace_app and the postgres_pace containers are configured with.
Furthermore, one Processing Platform of type postgres is configured, named detokenization-example-connection.
data-policy.yaml
This is the Data Policy we'll be creating in this tutorial. It implements the desired security measures, which we will see below.
Creating the policy
Running PACE
To start the containers, execute the following command from the detokenization directory:
dockercomposeup--pullalways
There should be quite a bit of logging, ending with the startup logs of the PACE app.
If all went well, the pace list tables CLI command should return two tables:
Before we dive into the data policy definition, let's have a look at the user group principals (implemented as DB roles) and corresponding users that are configured on the sample database:
fraud_and_risk is a role for the corresponding team to be used for purposes when there is no concrete suspicion of fraud. A DB user named far with password far has been assigned this role.
fraud_investigation is a role to be used when the fraud and risk team does have a reasonable suspicion of fraud. A DB user named fin with password fin has been assigned this role.
A DB user named other with password other is also available and hasn't been assigned either of the roles.
We can connect with the sample database on port 5431 on localhost, with detokenization as database name, with either of the above users. None of these users will be able to read data from the public.transactions or public.tokens tables. Feel free to connect with the database using your favourite SQL client. If you do want to see the data in these tables, you can connect with the user detokenization using the password detokenization.
psql examples
To illustrate, we can try to retrieve transactions with the far user:
psqlpostgresql://far:far@localhost:5431/detokenization-c"select * from public.transactions limit 5"
Which results in the following output:
ERROR: permission denied for table transactions
And using detokenization instead:
psqlpostgresql://detokenization:detokenization@localhost:5431/detokenization-c"select * from public.transactions limit 5"
We can use the PACE CLI to retrieve a blueprint policy for the transactions table as follows:
pace get data-policy --blueprint \
--processing-platform detokenization-example-connection \
--database detokenization \
--schema public \
transactions
This returns the following data policy definition in YAML, without any field transforms or filters yet:
This definition essentially contains the reference to and schema of the source table. If desired, we can change the title and provide a description, such as:
metadata:description:"This is policy restricts access to the transactions data."title:"Public transactions"
We can start filling in the policy by adding a rule_sets section:
Here we specify the full name of the view that will be created by the policy. By adding a filter, we can codify that only users from the fraud_and_risk and fraud_investigation groups can view European data:
The condition is defined in (standard) SQL, compatible with the target platform (our Postgres database in this case). Next, we can codify the remaining requirements as field transforms, resulting in the below rule set specification. See our overview of transforms for more details on their configuration.
[...]rule_sets: - target:ref:integration_fqn:public.transactions_viewfilters: - conditions: - principals: - group:fraud_and_risk - group:fraud_investigationcondition:"true"# Any other users do not see any European data - principals: []condition:"region <> 'Europe'"field_transforms: - field:name_parts: [ card_holder_name ]transforms:# Only the fraud_investigation group can see the card holder name - principals: - group:fraud_investigationidentity: {}# The card holder name is made null for everyone else - principals: []nullify: {} - field:name_parts: [ card_number ]transforms:# The card number is detokenized for the fraud_investigation group, thus making the original value visible - principals: - group:fraud_investigationdetokenize:token_source_ref:public.tokenstoken_field:name_parts: [ token ]value_field:name_parts: [ value ]# The card number remains tokenized for everyone else - principals: []identity: {} - field:name_parts: [ transaction_id ]transforms:# The fraud_investigation and fraud_and_risk groups can see the full transaction id - principals: - group:fraud_investigation - group:fraud_and_riskidentity: {}# Everyone else can only see the last 3 digits of the transaction id - principals: []regexp:regexp:"^\\d+(\\d{3})$"replacement:"******$1"
The detokenize transform references the tokens table and specifies which field (column) contains the token and which the value.
The data-policy.yaml file in the detokenization directory contains the same policy. To apply it and create the transactions_view, we can use the PACE CLI again:
paceupsertdata-policydata-policy.yaml--apply
The pace list data-policies command will now return it.
Querying the view
Now that the public.transactions_view has been created, we can compare the query results the various users get.
psql examples
You could use the following psql commands to show the complete result sets.
Fraud and Risk
psqlpostgresql://far:far@localhost:5431/detokenization-c"select * from public.transactions_view"
Fraud Investigation
psqlpostgresql://fin:fin@localhost:5431/detokenization-c"select * from public.transactions_view"
Other
psqlpostgresql://other:other@localhost:5431/detokenization-c"select * from public.transactions_view"
Cleanup
That wraps up this detokenization tutorial. To clean up all resources, run the following command after stopping the currently running process with ctrl+C.
dockercomposedown
Any questions or comments? Please ask them on Slack.