Schema
The source's data structure
Introduction
The first step to defining a Data Policy
is knowing what your source data looks like. This source data most likely will live in a Data Platform
or Data Catalog
. But you will also be able to define the structure yourself. Below we demonstrate the different options to define your schema
Blueprint Policy
Below we will talk about getting a blueprint policy. A blueprint policy is a Data Policy
where only the source ref and fields, and potentially a ruleset are populated. This serves as a starting point for defining the rest of the Data Policy
. A ruleset can be present in the blueprint policy, but this depends on whether global transforms are defined. A blueprint policy is retrieved from either a Data Catalog or a Processing Platform
Sample Blueprint Policy
A blueprint policy consists of metadata such as a title, version, create time and last updated time as well as user defined tags. It has information about the processing platform, being its type and the configured id. But most importantly it contains the fields, or schema, of the source data. Each field consists of an array of name_parts
, which is the path to field and typically contains only one entry for columnar/flat data. Furthermore, it contains the type, whether or not it is a required field and user defined tags. Lastly the source section contains a reference to the source table and again user defined tags for the source.
Data Platform
If your Data Platform
(or Processing Platform
) has knowledge of the source's data structure, we provide both a REST API and a CLI to receive a blueprint policy. Find out what the minimum required permissions are per Processing Platform
in our processing platform integration pages.
Data Catalog
The source's data structure can also be retrieved from a Data Catalog
. Here too we provide both a REST API and a CLI to receive the blueprint policy. Find out what the minimum required permissions are per Data Catalog
in our data catalog integration pages.
Last updated