PACE can use a data catalog to retrieve schemas and labels or tags on tables or table columns. These can be used to create data-policies that will be applied on the processing platforms.
PACE can retrieve metadata from a data catalog. This includes schemas, as well as field or table tags. These can be used to create Data Policies that will be applied on the processing platforms.
The tables in a data catalog are referenced via the following hierarchy
pace list schemas --catalog COLLIBRA-testdrive \
--database 99379294-6e87-4e26-9f09-21c6bf86d415 -o table
ID NAME
c0a8b864-83e7-4dd1-a71d-0c356c1ae9be Google BigQuery>test-drive-329411>Marketing
342f676c-341e-4229-b3c2-3e71f9ed0fcd Google BigQuery>test-drive-329411>HR
# command output shortened
pace list tables --catalog COLLIBRA-testdrive \
--database 99379294-6e87-4e26-9f09-21c6bf86d415 \
--schema 342f676c-341e-4229-b3c2-3e71f9ed0fcd -o table
tables:
- id: 821b684d-7fd4-428f-8d10-8e90f52aa5b9
name: Google BigQuery>test-drive-329411>HR>attendancelogs
...
- id: f9ad905f-09e6-4259-a8a2-80b135cd3f1b
name: Google BigQuery>test-drive-329411>HR>employees_income
- id: 8254e494-7856-4f2c-b736-6f6ca310081a
name: Google BigQuery>test-drive-329411>HR>accounts_salarypayments
- id: 27231897-24f9-4b26-9171-afe8d88156c7
name: Google BigQuery>test-drive-329411>HR>employeearchive
- id: 5f345874-0055-4349-8f7d-0bfab88796a1
name: Google BigQuery>test-drive-329411>HR>departments
- id: 89b34f6f-4664-4a6a-99c6-2cdc27abd5c3
name: Google BigQuery>test-drive-329411>HR>payroll
- id: 5ad8ea41-df6d-4421-9da5-791d0461f7f7
name: Google BigQuery>test-drive-329411>HR>salaries
- id: 6e978083-bb8f-459d-a48b-c9a50289b327
name: Google BigQuery>test-drive-329411>HR>employee_yearly_income
We can retrieve a blueprint policy from a catalog as follows.
We would typically redirect the output of this command into a file > blueprint.yaml, and add a rule set to the file. See Create a Data Policyfor details on how to do this.
Configuring a Data Catalog connection
Every catalog type is configured the same way.
id The identifier of this Data Catalog connection in Pace.
type This type is taken from our [Protobuf api definition][api] and is currently COLLIBRA, DATAHUB or ODD (for OpenDataDiscovery).
serverUrl How to reach the Data Catalog
token an authentication token. Some Data Catalogs require this.
userName Some other Data Catalogs use userName/password combinations.
password
fetchSize Unspecified usage by Datahub. Keep at 1 for now.
Collibra
Collibra uses serverUrl, userName and password (for now, we'll add other types of authentication later) serverUrl typically contains https://....collibra.com/graphql/knowledgeGraph/v1.
The Datahub configuration uses serverUrl (typically https://...:9002/api/graphql) and token which is a datahub api token. Datahub access tokens can be created via Settings → Manage Access Tokens
Open Data Discovery
Our current implementation only uses serverUrl, currently http://some.ip.address:8080. This is obviously alpha!
Below is an example of the catalogs section of a PACE configuration file.