Generate some Random Sample Values using the OpenAI plugin
Extensibility is an important aspect of PACE. Functionality can be added through plugins, the first type of which is the OpenAI plugin. More detail on creating your own plugins will follow soon. In this tutorial, we cover our OpenAI Data Policy Generator implementation.
This plugin has two actions, we'll explore the GENERATE_SAMPLE_DATA action in this tutorial
The OpenAI plugin GENERATE_SAMPLE_DATA action uses the OpenAI Chat API to create random sample data for a given table definition, that one could create for instance via one of the pace get data-policy ... command invocations.
An OpenAI API key is required for this tutorial. You can generate one in the OpenAI platform at https://platform.openai.com/api-keys. We recommend creating a new API key for this PACE plugin
File and directory setup
We provide an example setup in our GitHub repository, as explained below. If you already have a running instance of PACE, you may skip this setup and simply add the OpenAI API key to the PACE application configuration. See the config/application.yaml section below.
Clone the repository from GitHub, if you haven't already done so. This command assumes you're not using SSH, but feel free to do so.
gitclonehttps://github.com/getstrm/pace.git
Create a directory sample-data-generator with the following directory tree structure:
In this file, we've added some more system instructions, to enrich the generated sample data.
Note: there are some Dutch language column names whose meaning and resulting significance we'll explain below.
source:ref:generate_sample_demofields: - name_parts: [ email ]type:varcharrequired:true - name_parts: [ gebruikersnaam ]type:varcharrequired:true - name_parts: [ organisatie ]type:varcharrequired:true - name_parts: [ klantnummber ]type:varcharrequired:true - name_parts: [ bankrekening ]type:varcharrequired:falseadditional_system_instructions: - don't forget to add the csv column headers - please return 20 result rows - if you recognize something as being an email, please use email domains in europe - if you recognize something as a customer number please generate 7 digits between 1000000 and 2999999 - for recognized bank accounts use european style IBAN
Generating the Data Policy
Tutorial video
Running PACE
Make sure your current working directory is the same as the directory you've set up in the previous section. Start the containers by running:
dockercomposeup
There should be quite a bit of logging, ending in the banner of the PACE app booting.
This will take a little while (around 20 seconds during our testing). If OpenAI replied within the configured timeout, PACE will print the generated random csv to your terminal. The output should look similar to this:
NOTE: GPT figured out that klantnummer must relate to a customer hence the CUST prefix. Even more interesting it figured out that bankrekening should probably be given Dutch style IBAN numbers!
This is all still quite experimental, so not all instructions may work as well. We've also frequently encountered OpenAI timeouts, resulting in an Internal Error response to your cli. Let us know if you encounter any issues, and we will further explore this thing called GenAI!
Enriching the instructions
Adding some more configuration to the instructions:
additional_system_instructions: - don't forget to add the csv column headers - please return 20 result rows - if you recognize something as being an email, please use email domains in europe - if you recognize something as a customer number please generate 7 digits between 1000000 and 2999999 - for recognized bank accounts use european style IBAN