Generate some Random Sample Values using the OpenAI plugin
Extensibility is an important aspect of PACE. Functionality can be added through plugins, the first type of which is the OpenAI plugin. More detail on creating your own plugins will follow soon. In this tutorial, we cover our OpenAI Data Policy Generator implementation.
pace list plugins
plugins:
- actions:
- invokable: true
type: GENERATE_DATA_POLICY
- invokable: true
type: GENERATE_SAMPLE_DATA
id: openai
implementation: com.getstrm.pace.plugins.builtin.openai.OpenAIPlugin
This plugin has two actions, we'll explore the GENERATE_SAMPLE_DATA action in this tutorial
The OpenAI plugin GENERATE_SAMPLE_DATA action uses the OpenAI Chat API to create random sample data for a given table definition, that one could create for instance via one of the pace get data-policy ... command invocations.
An OpenAI API key is required for this tutorial. You can generate one in the OpenAI platform at https://platform.openai.com/api-keys. We recommend creating a new API key for this PACE plugin
File and directory setup
We provide an example setup in our GitHub repository, as explained below. If you already have a running instance of PACE, you may skip this setup and simply add the OpenAI API key to the PACE application configuration. See the config/application.yaml section below.
Clone the repository from GitHub, if you haven't already done so. This command assumes you're not using SSH, but feel free to do so.
git clone https://github.com/getstrm/pace.git
Create a directory sample-data-generator with the following directory tree structure:
In this file, we've added some more system instructions, to enrich the generated sample data.
Note: there are some Dutch language column names whose meaning and resulting significance we'll explain below.
source:
ref: generate_sample_demo
fields:
- name_parts: [ email ]
type: varchar
required: true
- name_parts: [ gebruikersnaam ]
type: varchar
required: true
- name_parts: [ organisatie ]
type: varchar
required: true
- name_parts: [ klantnummber ]
type: varchar
required: true
- name_parts: [ bankrekening ]
type: varchar
required: false
additional_system_instructions:
- don't forget to add the csv column headers
- please return 20 result rows
- if you recognize something as being an email, please use email domains in europe
- if you recognize something as a customer number please generate 7 digits between 1000000 and 2999999
- for recognized bank accounts use european style IBAN
Generating the Data Policy
Tutorial video
Running PACE
Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:
docker compose up
There should be quite a bit of logging, ending in the banner of the PACE app booting.
Note that we are not explaining this to GPT, nor is there any hard-coded Dutch somewhere inside our source code.
In the same directory, execute the following PACE CLI command:
pace invoke plugin openai GENERATE_SAMPLE_DATA --payload instructions.yaml
This will take a little while (around 20 seconds during our testing). If OpenAI replied within the configured timeout, PACE will print the generated random csv to your terminal. The output should look similar to this:
NOTE: GPT figured out that klantnummer must relate to a customer hence the CUST prefix. Even more interesting it figured out that bankrekening should probably be given Dutch style IBAN numbers!
This is all still quite experimental, so not all instructions may work as well. We've also frequently encountered OpenAI timeouts, resulting in an Internal Error response to your cli. Let us know if you encounter any issues, and we will further explore this thing called GenAI!
Enriching the instructions
Adding some more configuration to the instructions:
additional_system_instructions:
- don't forget to add the csv column headers
- please return 20 result rows
- if you recognize something as being an email, please use email domains in europe
- if you recognize something as a customer number please generate 7 digits between 1000000 and 2999999
- for recognized bank accounts use european style IBAN
pace invoke plugin openai GENERATE_SAMPLE_DATA --payload complex-instructions.yaml
This will result in something similar as shown below: