Summary

This document explains the Unity Catalog, a data governance solution. It details how it manages data access, security, and organization. The document also explains the security aspects of using Unity Catalog.

Full Transcript

Unity catalog #databricks #catalog #data-governance Data governance data access control: control who has access to which data data access audit: capture and record all access to data data lineage: capture upstream sources and downstreams consumer data discovery: ability to se...

Unity catalog #databricks #catalog #data-governance Data governance data access control: control who has access to which data data access audit: capture and record all access to data data lineage: capture upstream sources and downstreams consumer data discovery: ability to search for and discover authorized assets Unity Catalog The Databricks Unity catalog allows a unified governance for data, analytics, and AI. This is achieved through: unified governance across clouds: fine-grained governance for data lakes across clouds—based on open standard ANSI SQL unified data and AI assets: centrally share, audit, secure and manage all data types with one simple interface unified existing catalogs: works in concert with existing data, storage, and catalogs—no hard migration required. Metastore The metastore is the top level logical container in the unity catalog. The elements inside the metastore are: store credentials external location catalog: this is a container of schemas (databases) which are logical containers of tables, views, and functions share recipient Note: notice this is a logical construct to organizing the data, different but related to the physical separation of control plane and cloud/data plane. For legacy access, the catalog can contain a special catalog called hive_metastore. The catalog defines a three level namespace to access an object: catalog, schema and object, for example: SQL Select * from catalog.schema.table; Architecture The new architecture allows to different workspaces reuse access control list, security policies, etc. Unity Catalog Security Model Query life cycle: 1. principal: send query 2. compute: Databricks checks namespaces, metadata, and grants in the Unity catalog. Also, creates audit log. 3. Cloud storage: Assume IAM role or service principal 4. unity catalog: return short-lived token and signed URL 5. compute: request data from URL with the short-lived token 6. cloud storage: returns data 7. compute: enforce policies 8. principal: receive results Compute resources for Unity Catalog Use of unity catalog depends on Cluster access mode: single user: supports unity catalog shared: supports unity catalog No isolation shared: not support unity catalog

Use Quizgecko on...
Browser
Browser