🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

(Delta) Ch9 Delta Sharing [True  - False]
28 Questions
0 Views

(Delta) Ch9 Delta Sharing [True - False]

Created by
@EnrapturedElf

Podcast Beta

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Commercial data-sharing solutions provide unlimited scalability.

False

Delta Sharing is a vendor-specific technology.

False

Object storage allows handling limited amounts of data.

False

Commercial data-sharing solutions do not involve data movement.

<p>False</p> Signup and view all the answers

Delta Sharing allows data sharing only in Delta Lake format.

<p>False</p> Signup and view all the answers

Commercial data-sharing solutions provide open cross-platform data sharing.

<p>False</p> Signup and view all the answers

Delta Sharing replicates data for different recipients across various cloud platforms.

<p>False</p> Signup and view all the answers

Cloud object storage is not scalable.

<p>False</p> Signup and view all the answers

Delta Sharing supports a limited range of clients, including only Power BI and Tableau.

<p>False</p> Signup and view all the answers

Delta Sharing provides robust security and governance capabilities for data providers.

<p>True</p> Signup and view all the answers

Data providers need to convert their data to a proprietary format to share it using Delta Sharing.

<p>False</p> Signup and view all the answers

As a data recipient, you need to implement the Delta Sharing protocol to access shared data.

<p>False</p> Signup and view all the answers

The sharing server generates long-lived pre-signed URLs for data recipients to access shared data.

<p>False</p> Signup and view all the answers

A shared table can be accessed using a Delta Sharing client by passing the table name directly.

<p>False</p> Signup and view all the answers

Delta Sharing is a vendor-agnostic open protocol for secure data sharing.

<p>True</p> Signup and view all the answers

Delta Sharing only supports SQL data types.

<p>False</p> Signup and view all the answers

Delta Sharing focuses on ease of consumption, strong security, and scalability.

<p>True</p> Signup and view all the answers

Delta Sharing is not part of the Delta Lake project.

<p>False</p> Signup and view all the answers

Data providers need to implement the Delta Sharing protocol to share data.

<p>False</p> Signup and view all the answers

Delta Sharing is scalable to sharing massive data sets.

<p>True</p> Signup and view all the answers

Cloud object storage is not used in Delta Sharing for data transfer.

<p>False</p> Signup and view all the answers

Delta Sharing supports fine-grained access control and auditing.

<p>True</p> Signup and view all the answers

Delta Sharing allows sharing of static tables only.

<p>False</p> Signup and view all the answers

Delta Sharing is a vendor-locked technology.

<p>False</p> Signup and view all the answers

Delta Sharing requires data to be converted to a proprietary format.

<p>False</p> Signup and view all the answers

Delta Sharing supports a limited range of clients, including only pandas and Spark.

<p>False</p> Signup and view all the answers

Delta Sharing replicates data for different recipients across various cloud platforms.

<p>True</p> Signup and view all the answers

Delta Sharing lacks robust security and governance capabilities for data providers.

<p>False</p> Signup and view all the answers

Study Notes

Commercial Data-Sharing Solutions

  • Companies prefer commercial data-sharing solutions over in-house solutions as they offer a balance between control and resource allocation.
  • These solutions provide simplicity for users to share data with others on the same platform.
  • Limitations of commercial solutions include:

    Vendor Lock-in

    • Lack of interoperability with other platforms, making it difficult to share data with users of competing solutions.

    Data Movement

    • Data needs to be loaded onto a specific platform, involving additional steps like ETL and creating copies of the data.

    Scalability and Cost

    • Commercial solutions may have limitations on scaling imposed by vendors.
    • Additional costs arise from replicating data for different recipients across various cloud platforms.

Cloud Object Storage

  • Object storage is suitable for cloud environments due to its elastic nature and seamless scalability.
  • It allows handling vast amounts of data and effortlessly accommodating unlimited growth.

Open Source Delta Sharing

  • Open source data sharing avoids vendor-specific limitations and financial burdens.
  • Delta Sharing is an open source protocol designed for open cross-platform data sharing.
  • It allows sharing data in Delta Lake and Apache Parquet formats with any platform, whether on premises or another cloud.

Delta Sharing Goals

  • Open cross-platform data sharing to avoid vendor lock-in.
  • Share live data without data movement, allowing data recipients to directly connect.
  • Support a diverse range of clients, including popular tools like Power BI, Tableau, Apache Spark, pandas, and Java.
  • Centralized governance with robust security, auditing, and governance capabilities.

Delta Sharing Protocol

  • Data providers share existing tables or parts thereof stored on their cloud data lake in Delta Lake format.
  • The data provider runs a sharing server that implements the Delta Sharing protocol and manages access for recipients.
  • Recipients can use any Delta Sharing client supporting the protocol.

Delta Sharing Under the Hood

  • The recipient's client authenticates to the sharing server and asks to query a specific table.
  • The server verifies access, logs the request, and determines which data to send back.
  • The server generates short-lived pre-signed URLs for the client to read Parquet files directly from the cloud provider.

Reading a Shared Table

  • Access a shared table using a Delta Sharing client, passing the profile path, and listing all shared Delta tables.
  • Create a URL to access a shared table using the syntax: profile_path + "#delta_sharing..".
  • Read the shared table as a pandas DataFrame or a standard PySpark DataFrame using load_as_pandas() or load_as_spark().

Data Sharing Challenges

  • Current data sharing solutions are vendor-dependent, leading to vendor lock-in risks.
  • Siloed systems make data hard to consume and manage.

Introducing Delta Sharing

  • An open protocol for secure data sharing, enabling easy and secure data sharing between organizations.
  • Supports multiple data types and languages, not just SQL.
  • Focuses on ease of consumption, strong security, and scalability.
  • Part of the Delta Lake project under the Linux Foundation.

How Delta Sharing Works

  • Involves a data provider and a data recipient.
  • Data provider sets up a Delta Sharing server, manages access permissions, and decides who gets access to which data subsets.
  • Data recipient connects to the server using an open protocol client (e.g., Apache Spark, pandas, Tableau).
  • Server filters data, checks access permissions, and generates short-lived URLs for secure transfer.

Benefits of Delta Sharing

  • Easy to implement Delta Sharing client for existing Parquet-supporting systems.
  • Fast, cheap, reliable, and parallelizable data transfer using cloud object stores.
  • Scalable to sharing massive data sets.
  • Supports fine-grained access control, auditing, and compliance.

Ecosystem and Adoption

  • Multiple open-source projects and commercial systems support Delta Sharing.
  • Leading data providers and vendors back the project.
  • Databricks is implementing Delta Sharing, providing an integrated solution for secure data sharing.

Demo and Use Cases

  • Demo: sharing vaccination data with the CDC using Delta Sharing.
  • Use cases: data sharing across organizations, real-time data sharing, and secure data sharing.

Delta Sharing Syntax and Interface

  • Create a share object using SQL commands.
  • Add tables to the share.
  • Grant permissions to recipients using standard grant statements.
  • Use REST APIs for programmatic management.

Key Features

  • Allows access to data in Databricks directly through S3.
  • Enables processing and counting of data with ease.
  • Can connect to Delta Sharing using favorite tools like Spark, pandas, and more.

Connecting to Delta Sharing

  • Can connect to Delta Sharing using pandas on a single machine.
  • Load data from Delta Sharing into a pandas data frame.
  • Perform analysis and visualization on the data frame.

Business Intelligence Tools

  • Can connect Delta Sharing to business intelligence tools like Tableau and Power BI.
  • Load data into Tableau and Power BI without setting up a separate data warehouse.
  • Perform analysis and visualization on the data in real-time.

Features and Roadmap

  • Envisions sharing of streams, machine learning models, table views, and arbitrary files.
  • Working on governance capabilities, including time-limited sharing and restricted clean room analytics.
  • Released a reference server and clients for pandas, Spark, and Rust.
  • Working on open-source connectors and commercial connectors with partners.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

Commercial data-sharing solutions are chosen by companies as an alternative to building in-house solutions. They offer a balance between not wanting to allocate extensive time and resources to developing a proprietary solution and desiring greater control than what cloud object storage can provide.

Use Quizgecko on...
Browser
Browser