Podcast
Questions and Answers
Within Snowflake's architecture, what is the primary role of the control plane?
Within Snowflake's architecture, what is the primary role of the control plane?
- Executing the query
- Storing all data on cloud object storage
- Managing query processing clusters
- Filtering unnecessary blocks using lightweight per-block indexes/filters (correct)
Query processing in Snowflake occurs within a single, centralized node to ensure data consistency.
Query processing in Snowflake occurs within a single, centralized node to ensure data consistency.
False (B)
Briefly describe the function of the query engine in Snowflake's architecture.
Briefly describe the function of the query engine in Snowflake's architecture.
The query engine requests data from cache/storage and executes the query.
In Snowflake, query processing is handled by clusters of instances referred to as ______ for query processing.
In Snowflake, query processing is handled by clusters of instances referred to as ______ for query processing.
Match the following Snowflake architectural components with their respective data storage locations:
Match the following Snowflake architectural components with their respective data storage locations:
How does Snowflake handle data updates and transactions?
How does Snowflake handle data updates and transactions?
Snowflake's storage is implemented using mutable blocks on S3.
Snowflake's storage is implemented using mutable blocks on S3.
What is the monthly on-demand storage pricing for Snowflake, per terabyte?
What is the monthly on-demand storage pricing for Snowflake, per terabyte?
In Snowflake, consistent hashing helps with ________.
In Snowflake, consistent hashing helps with ________.
Match the Redshift instance types with their approximate hourly pricing:
Match the Redshift instance types with their approximate hourly pricing:
What advantage does Snowflake's architecture provide in terms of elasticity?
What advantage does Snowflake's architecture provide in terms of elasticity?
Redshift's 'Query as a Service' pricing model requires deciding on a cluster size, regardless of actual utilization.
Redshift's 'Query as a Service' pricing model requires deciding on a cluster size, regardless of actual utilization.
What is the significance of Snowflake being able to launch several virtual warehouses for the same database?
What is the significance of Snowflake being able to launch several virtual warehouses for the same database?
Why are stateful storage services often considered scaling bottlenecks in cloud environments?
Why are stateful storage services often considered scaling bottlenecks in cloud environments?
FaaS (Function as a Service) such as AWS Lambda perfectly achieves elasticity and scalability for stateful computations.
FaaS (Function as a Service) such as AWS Lambda perfectly achieves elasticity and scalability for stateful computations.
Besides functionality and cost, what crucial aspect of a cloud service is often not covered by SLAs or documentation but is essential for designing cost-efficient software architectures?
Besides functionality and cost, what crucial aspect of a cloud service is often not covered by SLAs or documentation but is essential for designing cost-efficient software architectures?
To effectively utilize an existing cloud service like S3, one would need to understand its functionality, cost, and ______.
To effectively utilize an existing cloud service like S3, one would need to understand its functionality, cost, and ______.
What does the monitoring of request latency in S3 indirectly measure?
What does the monitoring of request latency in S3 indirectly measure?
Load balancers are inherently stateful components that present challenges when scaling cloud applications.
Load balancers are inherently stateful components that present challenges when scaling cloud applications.
According to the content, which of the following is the primary purpose of benchmarking cloud services?
According to the content, which of the following is the primary purpose of benchmarking cloud services?
In the context of cloud data management, what are often key components of other services (e.g., to manage control plane state)?
In the context of cloud data management, what are often key components of other services (e.g., to manage control plane state)?
In Aurora's standard pricing model, what are the main cost components?
In Aurora's standard pricing model, what are the main cost components?
In the Aurora storage layer, the primary node writes changed pages directly to disk for persistence.
In the Aurora storage layer, the primary node writes changed pages directly to disk for persistence.
What is the key architectural difference that Microsoft Socrates implements compared to Aurora regarding page storage?
What is the key architectural difference that Microsoft Socrates implements compared to Aurora regarding page storage?
In modern OLAP designs, cloud object stores like S3 enable the _____ of storage and compute.
In modern OLAP designs, cloud object stores like S3 enable the _____ of storage and compute.
Match the following database system characteristics with the corresponding system:
Match the following database system characteristics with the corresponding system:
What is a primary disadvantage of fully distributed OLTP systems compared to the systems discussed?
What is a primary disadvantage of fully distributed OLTP systems compared to the systems discussed?
Horizontal partitioning in traditional OLAP systems allows compute and storage to be scaled independently.
Horizontal partitioning in traditional OLAP systems allows compute and storage to be scaled independently.
Given a seek latency of 30ms and a scan speed of 50MB/s, what is the approximate time in system (W) for a 16MB request?
Given a seek latency of 30ms and a scan speed of 50MB/s, what is the approximate time in system (W) for a 16MB request?
According to Little's Law, if the request arrival rate (λ) is 640/s and the time in system (W) for a request is 0.35s, the number of requests in the system (L) is approximately ______.
According to Little's Law, if the request arrival rate (λ) is 640/s and the time in system (W) for a request is 0.35s, the number of requests in the system (L) is approximately ______.
What is the main advantage of Aurora writing WAL entries to multiple storage nodes?
What is the main advantage of Aurora writing WAL entries to multiple storage nodes?
For request sizes significantly above 16MB, the cost associated with S3 GET operations dominates the overall cost when compared to using EC2 instances.
For request sizes significantly above 16MB, the cost associated with S3 GET operations dominates the overall cost when compared to using EC2 instances.
OLAP systems are optimized for large _____ scans due to their columnar, compressed storage.
OLAP systems are optimized for large _____ scans due to their columnar, compressed storage.
Why might an organization choose Aurora I/O-Optimized pricing over the standard pricing model?
Why might an organization choose Aurora I/O-Optimized pricing over the standard pricing model?
When considering S3 performance, what is a reasonable approximation for the bandwidth of a single access?
When considering S3 performance, what is a reasonable approximation for the bandwidth of a single access?
In the context of S3, how can very high bandwidth be achieved despite the latency associated with individual accesses?
In the context of S3, how can very high bandwidth be achieved despite the latency associated with individual accesses?
Match the following S3 components with their respective functions:
Match the following S3 components with their respective functions:
According to the information presented at FAST'23, approximately how many objects were stored in S3?
According to the information presented at FAST'23, approximately how many objects were stored in S3?
S3 data is primarily partitioned by customer to ensure data isolation and security.
S3 data is primarily partitioned by customer to ensure data isolation and security.
Based on the data provided, what happens to the overhead as the number of disks increases?
Based on the data provided, what happens to the overhead as the number of disks increases?
S3 guarantees eleven 9s availability.
S3 guarantees eleven 9s availability.
What is the primary difference between OLTP and OLAP systems in terms of query type?
What is the primary difference between OLTP and OLAP systems in terms of query type?
An Extract, Transform, Load (_______) process periodically moves data from the operational to the analytical system.
An Extract, Transform, Load (_______) process periodically moves data from the operational to the analytical system.
Match the DynamoDB read types with their corresponding read request units (up to 4KB):
Match the DynamoDB read types with their corresponding read request units (up to 4KB):
Which of the following isolation levels is supported by DynamoDB's transactional functionality?
Which of the following isolation levels is supported by DynamoDB's transactional functionality?
In classic DBMS design for OLTP, changes are applied directly to pages on disk before being logged.
In classic DBMS design for OLTP, changes are applied directly to pages on disk before being logged.
What is the purpose of a Write-Ahead Log (WAL) in a classic DBMS?
What is the purpose of a Write-Ahead Log (WAL) in a classic DBMS?
Which storage medium is primarily used for pages in a classic DBMS design for OLTP?
Which storage medium is primarily used for pages in a classic DBMS design for OLTP?
In the context of database systems, PostgreSQL, SQL Server and Aurora are examples of ________ systems.
In the context of database systems, PostgreSQL, SQL Server and Aurora are examples of ________ systems.
Flashcards
Snowflake Control Plane
Snowflake Control Plane
Manages user requests and system metadata in a multi-tenant environment, backed by OLTP systems and DRAM caching.
Virtual Warehouse
Virtual Warehouse
Tenant-specific clusters of compute instances (like EC2) that process queries.
Snowflake Storage Layer
Snowflake Storage Layer
Data is stored on cloud object storage.
Control Plane's Query Role
Control Plane's Query Role
Signup and view all the flashcards
Query Engine's Job
Query Engine's Job
Signup and view all the flashcards
Cloud Scaling Bottlenecks
Cloud Scaling Bottlenecks
Signup and view all the flashcards
Challenges of Scaling Data
Challenges of Scaling Data
Signup and view all the flashcards
Service Understanding
Service Understanding
Signup and view all the flashcards
Benchmarking Necessity
Benchmarking Necessity
Signup and view all the flashcards
Performance Impact
Performance Impact
Signup and view all the flashcards
Latency and Utilization
Latency and Utilization
Signup and view all the flashcards
S3 Bandwidth Utilization
S3 Bandwidth Utilization
Signup and view all the flashcards
Request Volume Experimentation
Request Volume Experimentation
Signup and view all the flashcards
Snowflake Immutable Blocks
Snowflake Immutable Blocks
Signup and view all the flashcards
Supported Clouds
Supported Clouds
Signup and view all the flashcards
Snowflake Caching
Snowflake Caching
Signup and view all the flashcards
Consistent Hashing
Consistent Hashing
Signup and view all the flashcards
Update Transactions
Update Transactions
Signup and view all the flashcards
Snowflake Elasticity
Snowflake Elasticity
Signup and view all the flashcards
Redshift (Traditional)
Redshift (Traditional)
Signup and view all the flashcards
Seek Latency
Seek Latency
Signup and view all the flashcards
Scan Speed
Scan Speed
Signup and view all the flashcards
Request Arrival Rate (λ)
Request Arrival Rate (λ)
Signup and view all the flashcards
Time in System (W)
Time in System (W)
Signup and view all the flashcards
Little's Law
Little's Law
Signup and view all the flashcards
EC2 vs. S3 GET Cost
EC2 vs. S3 GET Cost
Signup and view all the flashcards
S3 Bandwidth
S3 Bandwidth
Signup and view all the flashcards
S3 Implementation
S3 Implementation
Signup and view all the flashcards
Aurora Storage Layer
Aurora Storage Layer
Signup and view all the flashcards
Aurora Pricing
Aurora Pricing
Signup and view all the flashcards
Microsoft Socrates
Microsoft Socrates
Signup and view all the flashcards
Fully Distributed OLTP systems
Fully Distributed OLTP systems
Signup and view all the flashcards
OLAP
OLAP
Signup and view all the flashcards
Horizontal Partitioning (Shared Nothing)
Horizontal Partitioning (Shared Nothing)
Signup and view all the flashcards
Disaggregated Storage/Compute
Disaggregated Storage/Compute
Signup and view all the flashcards
Aurora WAL
Aurora WAL
Signup and view all the flashcards
Aurora Read Replica
Aurora Read Replica
Signup and view all the flashcards
Aurora Backups
Aurora Backups
Signup and view all the flashcards
Data Redundancy
Data Redundancy
Signup and view all the flashcards
OLTP (OnLine Transaction Processing)
OLTP (OnLine Transaction Processing)
Signup and view all the flashcards
OLAP (OnLine Analytical Processing)
OLAP (OnLine Analytical Processing)
Signup and view all the flashcards
ETL (Extract, Transform, Load)
ETL (Extract, Transform, Load)
Signup and view all the flashcards
DynamoDB
DynamoDB
Signup and view all the flashcards
Eventually Consistent Read
Eventually Consistent Read
Signup and view all the flashcards
Strongly Consistent Read
Strongly Consistent Read
Signup and view all the flashcards
Transactional Read
Transactional Read
Signup and view all the flashcards
Data Pages
Data Pages
Signup and view all the flashcards
Write-Ahead Log (WAL)
Write-Ahead Log (WAL)
Signup and view all the flashcards
Study Notes
State Management
- Cloud promises easy elasticity and scalability
- FaaS (e.g., AWS Lambda) comes close to stateless computation
- Load balancers and web servers scale easily and are stateless
- Scalable components often rely on stateful storage or database systems
- Stateful systems become synchronization and scaling points
- Scaling data management systems is challenging due to large volumes, high update rates, and durability requirements
- Database systems are key components in cloud services like OLTP for managing control plane state
Benchmarking
- Successful service use requires understanding functionality, cost, and performance (e.g., S3)
- S3 costs are $21/TB/month, $0.4/M GET, $5/M PUT
- Documentation often lacks performance details beyond SLAs
- Designing cost-efficient architectures requires knowing performance properties
- Validating and understanding performance of a service requires benchmarking
- Reference: Experiments from "Exploiting Cloud Object Storage for High-Performance Analytics, Durner et al., VLDB 2023"
S3 Latency
- Latency characteristics vary depending on size of the objects being accessed
Request Latency
- Request latency over time indirectly measures overall S3 utilization
- Latency patterns can show workday/weekend variations
- Artificial throttling may cause unnatural upper bounds
S3 Bandwidth
- Achieved high bandwidth by exploiting network bandwidth of the system
- Achieved by running many parallel requests
Request examples
- Seek latency is 30ms, scan speed is 50MB/s
- Achieving 80Gbit/s (=10GB/s) the request arrival rate (入) of 10GB/s : 16MB = 640/s, given 16MB requests
- Time in system for 16MB request (W): 30ms + 16MB/50MB/s = 350ms = 0.35s
- Little's law (the number of requests in the systems L): 640/s * 0.35s = 224
S3 GET Cost
- EC2 dominates S3 GET cost for request sizes over 16MB
S3 Performance Summary
- Each access has a latency of over 10ms and utilizes bandwidth similar to single disk disk (around 50 MB/s)
- Achieves high bandwidth with many disks available
- Request costs for small objects are high but are negligible for larger objects, especially when compared to EC2
- Similar cost and performance to other vendors
How to implement S3
- Load balancers, API servers, metadata storage, and object storage scale individually
- Additionally, it involves asynchronous/background storage management
- Object partitioning of data
S3 In The Real World (2023)
- Implemented using internal microservices
- A single large customer stores 600PB of data
- Handles 280 trillion objects and 100M requests per second overall
Cost efficient object store?
- S3 latencies and prices imply storage on disk other than SSD
- The cheapest disk instance in terms of GB/$ is d3en.12xlarge at $2,271/month for 335TB = $6.78/TB/month, calculating a discount
- S3 is S21/TB/month for comparison
- Needs more than one disk copy for redundancy
d3en requests
- Disks have around 100MB/s throughput
- Reading a 13,980GB disk takes 38 hours
- 24 * 100MB/s = 2.4 GB/s is the I/O bandwidth
- Disks typically have a 10ms latency
- 24 * 1s/10ms = 2400 IOPS
Object access
- Objects accessed more frequently than every 38 hours need duplication to handle the workload
- Workloads are skewed; less frequent access data, more is frequent
- Caching hot objects (on RAM, SSD, or both) makes sense
- Could be an additional service in front of the disk storage servers
- The objects could utilize API servers or metadata for caching
How To Achieve 11'9s?
- One approach involves three full copies across two or three AZs, with a cost of $6.78*3 = $20.34
- Use erasure coding as an alternative to 11'9s
Summary of durability
- S3 guarantees 11'9s durability, but only 3'9s availability
DBMS Market
- $80 Billion USD per year market size
OLTP vs OLAP
- Online Transaction Processing (OLTP) is simple and latency-critical with many inserts/deletes/updates and optimized for writing using row stores such as Aurora
- Online Analytical Processing (OLAP) focuses on reading, batch updates, and large table scans optimized with column stores over the last 5 years such as AWS Redshift
DynamoDB
- Multi-tenant distributed key/value store
- Supports Create, Update, Read, Delete (CRUD) operations
- It supports eventual, strong, and transactional consistency for reads
- Transactional functionality limited to read committed isolation, size, and one-shot operations
- Follows provisioned capacity and on-demand pricing models
- Write request unit: 1/write (up to 1 KB), 2/transactional write
- Read request unit: 1/strong, 2/transactional, 0.5/eventual consistency
Pricing
- The pricing is
- $1.25 per million write request units
- $0.25 per million read request units
- $250 per TB/month
Classic DBMS Design for OLTP
- Data is organized as fixed-size pages (e.g., 8KB)
- Disk serves as the primary storage medium
- B-trees index rows
- Pages cached in RAM
- Inserts/updates/deletes are applied to pages in cache and logged in a write-ahead log (WAL)
- On commit, force WAL to disk (but not necessarily changed pages)
- Asynchronously write WAL to log archive and backup pages
- PostgreSQL, MySQL, and SQL Server work like this
Classic DBMS in the Cloud
- Classical design run on a cloud VM with instance storage
- Data is lost when the instance fails
- RAID will not help due to lack of physical disk access
- Recoverable from backup and a log archive
- Downtime will occur as a result
- Latest changes may be lost as a result
- Scalability, elasticity, and compute/storage disaggregation are a consideration
Remote Block Device & RBD details
- Improve durability when using virtual disks (e.g., , EBS) instead of instance storage
- Disk is attachable to a new instance if one shuts down
- Better durability is offered with virtual disks
- The RBD solution will cost more than standard solutions
Primary/Secondary Design
- Systems run on two identical nodes
- Primary node is the main node
- Write transactions handled by main node
- WAL shipped to the secondary
- Entries are applied eagerly
- Rapid switching to secondary node in the event of a failure
- Improved availability and durability are achieved
Amazon, Aurora
- Dominant cloud-native OLTP system that was introduced in 2014
- Storage and compute disaggregated
- Primary processing node (plus secondary available)
- Multi-tenant page and logging service
Details
- No writing of pages to disk
- WAL entries are written to 6 storage nodes in different AZs Log records replayed in the cloud
- Primary reads from of these three nodes
- Individual servers are fully functional without external
- Backups and log stored on S3
Pricing Standard
- The compute depends on instance size is $100/TB per month
- I/O is priced at $0.2/million
1/0-Optimized:
- Depends on compute, storage and instance size for billing
- $225/tb/month costs
- S3 Storage is free
- Network Architecture is an important component
Microsoft, Socrates
- This system's commercial name is SQL Database Hyperscale This design removes overhead, but the architecture has concerns Observations Regarding Aurora:
- Log service different from page
- 3 Copies is too expensive
Changes compared to Aurora
Key components can implement separated logging service
System types
- Classic
- HADR
- RBD
- Aurora-Like
- Socrates-Like
Properties
- In classic systems, logs reside on the same disk.
- Hot/Cold SSD can be deployed to accelerate recovery.
- Network infrastructure is an important component
Fully Distributed OLTP Systems
- Bottleneck with scalability for all system designs
- All writings go through systems memory
- Explicit provisioning may be required
- DynamoDB, Fully distributed available
- A major is the trade offs which can limit systems
OLAP Systems
- Optimized table scanning is available
- Large columnar storage is the major design consideration
- Must have a parallelism/distribution
Traditional OLAP Design
- Horizontal Partitioning is essential
- The system shares nothing
- Row are partitioned and not by column
- Specify each user partition for storage
- The original version of Amazon Redshift used horizontal partition
Modern OLAP Design
- Cloud Object Stores are available with S3 solutions
- This model allows for disaggregation and data transmission
- Snowflake used multi-tenant systems
- Snowflake: Multi tenant system that supported OLTP
- Query processing: virtual instances of an EC2 system Cloud object storage used for all processing
Snowflake Details
- Control plane: query OLTP systems for the latest blocks
- Blocks cache to disk and query
- Engine is in play
Snowflake Query Step-By-Step Example
- control plane:
-query OLTP systems for all current blocks
- filter necessary blocks using lightweight per-block indexes/filters
- send query plan and filtered blocks to query engine nodes
- Query Engine
- request data from cache/storage
- execute query
- Storage
- implemented as immutable blocks on S3 1,2,3- details
Cloud object stores and distributed store is supported over multi cloud
Features of Snowflake Details
- Cloud object store stores immutable data blocks, e.g., 10K tuples
- Blocks can be cached on query-processing nodes.
- Consistent hashing helps with elasticity.
- Updates/transactions:
- Update transactions create new objects.
- Coordinated in control plane using OLTP system's (FoundationDB) transaction functionality.
- Read queries will either see old or new objects.
- Elasticity: - Cluster size can be adjusted.
Summary continued...
- A pool of worker nodes is maintained to make this quick.
- Configurable auto-shutdown/startup (e.g., after 15 minutes of no queries). -One can launch several virtual warehouses for the same database as needed.
Snowflake Pricing
At least ~ $2/hour, $3-4 additional
The cost is multiplied by the cluster size: Small $2.00, medium $4.00
Hardware not specified and was C5d-2xlarge in ec2 = 8VCPU and 16G and .2 SSD
There is a $40 TB cost on storage and $24 dollars per month for capacity
Can be referenced online = https://ww.snowflakepricing.com
RedShift Pricing information
- Redshift has a traditional = data warehouse nothing share
- It operates on a $4-dollar model plus $24 dollars of storage
- Redshift does not perform storage measures and calculations from the same servers.
- Name and sizes are provided for all pricing calculations.
- RedShift managed storage is also available for users.
- Redshift does not perform storage measures and calculations from the same servers.
- Name and sizes are provided for all pricing calculations.
More specific examples:
- name vCPU Memory I/O Price Redshift (traditional = shared nothing):
- dc2.large 2 15 GiB 160 GB 0.60 GB/s $0.25/h
- dc2.8xlarge 32 244 GiB 2.7 TB 7.50 GB/s $4.80/h Redshift (managed storage):
- ra3.xlplus 4 32 GiB 0.65 GB/s $1.09/h
- ra3.4xlarge 12 96 GiB 2.00 GB/s $3.26/h
- ra3.16xlarge 48 384 GiB 8.00 GB/s $13.04/h
$24 Cost per TB per month
Query as a Service Pricing
- Pricing is often charged by cluster size
- Average utilization is often low as a result
- Google Redshift is often an excellent system for the reasons above. The cost scans $5 dollars\TB
No server administration necessary
How Are QaaS Systems Implemented?
- The implementation is dependent on the size and cluster itself.
- There are many ways to optimize code and store data efficiently. often best-effort per-query latency guarantee = the number of nodes used for one query is scaled with query size billing by scan size alone is exploitable-
- This is the way QaaS systems charge CPU or time limits may be available
QaaS systems
- The big query implementation details (https://www.vldb.org/pldb/vol13/p3461-melnik.pdf)
- Billing and scans are useful
Serverless FaaS analytic pricing
- This approach drives query latency to zero with many tenants
Idea- execute the query with server list functions
A prototype can be reviewed online - arxiv.org/pdf/1912.00937
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers Snowflake's architecture, including the control plane, query processing, and storage mechanisms. It also tests knowledge of Snowflake's data handling, pricing models, and advantages over other systems like Redshift. Assess your understanding of Snowflake's elasticity and cost structure.