Schema

The source's data structure

Introduction

The first step to defining a Data Policy is knowing what your source data looks like. This source data most likely will live in a Data Platform or Data Catalog. But you will also be able to define the structure yourself. Below we demonstrate the different options to define your schema

Blueprint Policy

Below we will talk about getting a blueprint policy. A blueprint policy is a Data Policy where only the source ref and fields, and potentially a ruleset are populated. This serves as a starting point for defining the rest of the Data Policy. A ruleset can be present in the blueprint policy, but this depends on whether global transforms are defined. A blueprint policy is retrieved from either a Data Catalog or a Processing Platform

Sample Blueprint Policy

A blueprint policy consists of metadata such as a title, version, create time and last updated time as well as user defined tags. It has information about the processing platform, being its type and the configured id. But most importantly it contains the fields, or schema, of the source data. Each field consists of an array of name_parts, which is the path to field and typically contains only one entry for columnar/flat data. Furthermore, it contains the type, whether or not it is a required field and user defined tags. Lastly the source section contains a reference to the source table and again user defined tags for the source.

data_policy:
  rule_sets: []
  id: ""
  metadata:
    tags: []
    title: SCHEMA.TABLE
    version: ""
    create_time: null
    update_time: null
  source:
    fields:
      - name_parts:
          - TRANSACTIONID
        tags: []
        type: numeric
        required: true
      - name_parts:
          - USERID
        tags: []
        type: varchar
        required: true
      - name_parts:
          - EMAIL
        tags: []
        type: varchar
        required: true
      - name_parts:
          - AGE
        tags: []
        type: numeric
        required: true
      - name_parts:
          - BRAND
        tags: []
        type: varchar
        required: true
      - name_parts:
          - TRANSACTIONAMOUNT
        tags: []
        type: numeric
        required: true
    tags: []
    ref: 
      integration_fqn: SCHEMA.TABLE
      platform:
        platform_type: SNOWFLAKE
        id: snowflake-demo-connection
    

Data Platform

If your Data Platform (or Processing Platform) has knowledge of the source's data structure, we provide both a REST API and a CLI to receive a blueprint policy. Find out what the minimum required permissions are per Processing Platform in our processing platform integration pages.

Data Catalog

The source's data structure can also be retrieved from a Data Catalog. Here too we provide both a REST API and a CLI to receive the blueprint policy. Find out what the minimum required permissions are per Data Catalog in our data catalog integration pages.

Last updated