Caching Strategies in Software Applications

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In which caching strategy does the cache directly update the database whenever data is modified?

  • Write-through (correct)
  • Cache-aside
  • Write-behind
  • Read-through

Which caching strategy is best suited for applications where data is frequently updated and needs to be immediately available?

  • Write-behind
  • Cache-aside
  • Write-through (correct)
  • Read-through

Which caching strategy provides flexibility in managing cache population and eviction, but may require app-level logic for cache management?

  • Read-through
  • Write-behind
  • Cache-aside (correct)
  • Write-through

Which caching strategy is designed for applications with complex caching needs or irregular access patterns?

<p>Cache-aside (D)</p>
Signup and view all the answers

Which caching strategy is best for applications where the data is typically retrieved more frequently than it is updated?

<p>Read-through (A)</p>
Signup and view all the answers

Which caching strategy is particularly well-suited for applications that prioritize low write latency and can tolerate some data loss in the event of a cache failure?

<p>Write-behind (A)</p>
Signup and view all the answers

Which caching strategy centralizes control over cache management, thus reducing the risk of cache stampedes?

<p>Read-through (B)</p>
Signup and view all the answers

Which caching strategy typically involves the use of a separate cache layer that acts as a backup for the database?

<p>Write-behind (B)</p>
Signup and view all the answers

Which of the following is NOT a valid target destination for Kinesis Data Firehose?

<p>Amazon Aurora (C)</p>
Signup and view all the answers

What is the primary use case for Kinesis Data Firehose?

<p>High-throughput data ingestion and delivery (D)</p>
Signup and view all the answers

How does Kinesis Data Firehose ensure near real-time data delivery?

<p>It buffers data and flushes it to the destination based on time and size rules. (D)</p>
Signup and view all the answers

Which of these is a benefit of using Kinesis Data Firehose compared to Kinesis Data Streams (KDS)?

<p>Kinesis Data Firehose handles scaling and resource allocation automatically. (B)</p>
Signup and view all the answers

Which of the following is NOT a benefit of using Enhanced Fan Out consumers in Kinesis Data Streams?

<p>Lower costs due to reduced resource utilization. (B)</p>
Signup and view all the answers

What is the purpose of the Kinesis Client Library (KCL)?

<p>To read and process records from Kinesis Data Streams. (C)</p>
Signup and view all the answers

What is a record processor in the context of Kinesis Client Library (KCL)?

<p>A component that reads and processes individual records from Kinesis Data Streams. (D)</p>
Signup and view all the answers

How can a user prevent the ExpiredIterationException from occurring when using Kinesis Client Library (KCL)?

<p>Increase the provisioned write capacity units (WCU) for DynamoDB table used for KCL coordination. (B)</p>
Signup and view all the answers

Which of the following technologies CAN read data from Kinesis Data Firehose?

<p>Amazon Lambda (B)</p>
Signup and view all the answers

What is the primary difference between Enhanced Fan Out consumers and Standard Consumers in Kinesis Data Streams?

<p>Enhanced Fan Out consumers are designed for higher data throughput and real-time processing. (C)</p>
Signup and view all the answers

Which data formats are supported by Athena?

<p>CSV, TSV, JSON, ORC, Parquet, Avro (B)</p>
Signup and view all the answers

Which of the following is NOT a valid use case for Athena?

<p>Creating reports and visualizations for data stored in S3 (B)</p>
Signup and view all the answers

Which security features are available for Athena queries?

<p>IAM, ACLs, S3 bucket policies, SSE-S3, SSE-KMS, CSE-KMS, TLS (B)</p>
Signup and view all the answers

How does Athena handle data encryption when querying S3 files?

<p>Athena can query encrypted S3 data without decrypting it. (A)</p>
Signup and view all the answers

Which of the following is NOT a valid method for optimizing Athena performance?

<p>Using a large number of partitions for the data (A)</p>
Signup and view all the answers

What are the two ways to define the partition key of a DynamoDB table?

<p>Partition key (HASH) and Sort key (RANGE) (D)</p>
Signup and view all the answers

What is the maximum size of a DynamoDB item?

<p>400 KB (A)</p>
Signup and view all the answers

Which of the following data types are not supported by DynamoDB?

<p>Datetime (D)</p>
Signup and view all the answers

Which read capacity unit (RCU) consumption is correct, given 10 strong consistent reads (SCR) per second for an item of size 6 KB?

<p>20 RCUs (A)</p>
Signup and view all the answers

What kind of read capacity unit will you consume when you use the ConsistentRead parameter set to True in the API calls?

<p>Strong consistent read (SCR) (B)</p>
Signup and view all the answers

What is the consequence of exceeding the provisioned capacity for a DynamoDB table?

<p>The table will be throttled, resulting in errors. (B)</p>
Signup and view all the answers

Which of the following is not considered an 'anti-pattern' for DynamoDB?

<p>Utilizing DynamoDB for managing user profiles and session data. (D)</p>
Signup and view all the answers

What is the purpose of 'burst capacity' in DynamoDB?

<p>It allows exceeding the provisioned capacity temporarily. (A)</p>
Signup and view all the answers

What is the function of the 'partition keys' in DynamoDB?

<p>They distribute data across multiple physical servers. (D)</p>
Signup and view all the answers

Which of the following would be a suitable scenario for using DynamoDB?

<p>Maintaining a database for a real-time analytics application. (D)</p>
Signup and view all the answers

What is a primary feature of Workgroups in the context of user organization and query access?

<p>They can provide separate query histories for each group. (C)</p>
Signup and view all the answers

Which aspect of AWS Glue Data Catalog security is broader than data filters in Lake Formation?

<p>IAM-based database and table level security. (D)</p>
Signup and view all the answers

Which of the following is NOT a key feature of Athena Notebook?

<p>Support for unstructured data only. (A)</p>
Signup and view all the answers

What best describes the purpose of Spark in the context of big data analytics?

<p>It processes data using a distributed computing framework. (A)</p>
Signup and view all the answers

Which feature of Spark Streaming allows it to handle constantly growing datasets?

<p>Structured streaming capabilities. (A)</p>
Signup and view all the answers

What is the primary component responsible for managing memory and scheduling in Spark?

<p>Spark Context. (C)</p>
Signup and view all the answers

Which of the following operations can be restricted through IAM policies in relation to the AWS Glue Data Catalog?

<p>Altering database structures. (B)</p>
Signup and view all the answers

Which programming support is NOT provided by Spark Integration within the Athena console?

<p>Java APIs. (A)</p>
Signup and view all the answers

Which library within Spark is designed specifically for machine learning at a large scale?

<p>MLLib. (B)</p>
Signup and view all the answers

What type of data format does Spark NOT support?

<p>XML. (A)</p>
Signup and view all the answers

What is a crucial feature of Workgroups in terms of cost management?

<p>They can track costs by workload. (C)</p>
Signup and view all the answers

Which component of Spark is primarily responsible for fault recovery?

<p>Spark core. (D)</p>
Signup and view all the answers

Which operation is NOT part of the supported functionalities for Spark streaming?

<p>Batch processing. (B)</p>
Signup and view all the answers

What best describes the relationship between Spark and Athena?

<p>Athena can run Jupyter notebooks with Spark, enabling enhanced data analysis. (B)</p>
Signup and view all the answers

What is a key benefit of using EMRFS with S3?

<p>Enables persistent storage after cluster termination (D)</p>
Signup and view all the answers

Which of the following describes the nature of data stored in EBS for HDFS?

<p>Data is deleted when the cluster is terminated (C)</p>
Signup and view all the answers

What does the serverless feature of EMR do?

<p>It decides the number of nodes required for tasks automatically (B)</p>
Signup and view all the answers

Kinesis data streams utilize which of the following components?

<p>Shards for ordered sequence of records (C)</p>
Signup and view all the answers

What is a characteristic of on-demand mode in Kinesis?

<p>Automatically adjusts capacity based on previous peak usage (C)</p>
Signup and view all the answers

How does Kinesis ensure the immutability of data once it is inserted?

<p>Records cannot be deleted after they are added to the stream (B)</p>
Signup and view all the answers

What is the function of Kinesis' shard splitting?

<p>It increases the overall stream capacity (D)</p>
Signup and view all the answers

When merging shards in Kinesis, what happens to the old shards?

<p>They are closed and deleted once data expires (D)</p>
Signup and view all the answers

What is a security measure implemented by Kinesis for data in transit?

<p>Encryption using HTTPS endpoints (B)</p>
Signup and view all the answers

What happens if a consumer in Kinesis tries to read the same data twice?

<p>It can occur due to retry mechanisms (B)</p>
Signup and view all the answers

What should be done to prevent duplicate records caused by producer retries?

<p>Embed unique record IDs in the data (B)</p>
Signup and view all the answers

In what scenario would resharding limitations affect Kinesis streams?

<p>When multiple resharding operations are needed simultaneously (D)</p>
Signup and view all the answers

Which statement about local file storage in EMR is accurate?

<p>Utilized for temporary data storage (B)</p>
Signup and view all the answers

Flashcards

SQL interface for S3

A way to run SQL queries directly on data stored in S3 without loading it.

Supported data formats

Formats that can be queried directly including CSV, JSON, ORC, Parquet, and Avro.

Cost structure

Pay as you go model; only successful queries are charged, failed ones are not.

Access control in security

Methods to manage access to S3 data including IAM, ACLs, and bucket policies.

Signup and view all the flashcards

Anti-patterns

Situations where Athena should not be used, such as for high-format reports or ETL processes.

Signup and view all the flashcards

Primary Key

A unique identifier used in database tables, must be defined at creation.

Signup and view all the flashcards

Partition Key

The HASH part of the primary key, must be unique and diverse for data distribution.

Signup and view all the flashcards

Sort Key

The RANGE part of the primary key, used in combination with the partition key to maintain uniqueness.

Signup and view all the flashcards

Read Capacity Unit (RCU)

Measurement of throughput for reads; varies by item size and consistency type.

Signup and view all the flashcards

Write Capacity Unit (WCU)

Measurement of throughput for writes; calculated based on item size.

Signup and view all the flashcards

Strongly Consistent Read (SCR)

Ensures correct data is read immediately after a write; uses more RCUs.

Signup and view all the flashcards

Eventually Consistent Read (ECR)

Allows for stale data; default read type that is quicker but less reliable.

Signup and view all the flashcards

Provisioned Mode

User-specified capacity for read/write operations; requires planning ahead.

Signup and view all the flashcards

Throttling

Prevents applications from exceeding their capacity limits to maintain stability.

Signup and view all the flashcards

Burst Capacity

Temporary exceedance of throughput limits, useful during traffic spikes.

Signup and view all the flashcards

Write-through Cache

Data is written to the cache whenever it is updated in the database, ensuring synchronization.

Signup and view all the flashcards

Advantages of Write-through

Minimizes cache misses and offers consistent performance.

Signup and view all the flashcards

Disadvantages of Write-through

Writes are slower and may cache rarely accessed data.

Signup and view all the flashcards

Cache-aside

Manages the cache explicitly, checking it before fetching data from the database.

Signup and view all the flashcards

Advantages of Cache-aside

Flexible and allows custom logic for cache population and eviction.

Signup and view all the flashcards

Disadvantages of Cache-aside

Requires app-level logic to manage the cache and risks cache misses.

Signup and view all the flashcards

Read-through Cache

Cache retrieves data from the database for the app when it's not present in cache.

Signup and view all the flashcards

Write-behind Cache

Data is written to the cache first, then asynchronously to the database later.

Signup and view all the flashcards

Kinesis Data Firehose (KDF)

A fully managed service for near real-time data streaming to various destinations.

Signup and view all the flashcards

Shard

A unit of scalability and parallelism in Kinesis, with a 2MB/sec throughput limit.

Signup and view all the flashcards

Checkpointing

The process of saving progress in Kinesis Clients to resume processing later.

Signup and view all the flashcards

Enhanced Fan Out

A feature allowing each consumer to receive 2MB/s of throughput per shard, improving delivery speed.

Signup and view all the flashcards

AWS Lambda

A compute service that allows running code in response to events or triggers without managing servers.

Signup and view all the flashcards

Data Buffering

Temporary storage of data in Firehose before sending to the destination based on size/time.

Signup and view all the flashcards

Kinesis Client Library (KCL)

A library that simplifies consuming data from Kinesis and managing state across multiple consumers.

Signup and view all the flashcards

Data Transformation

Converting or modifying data formats in real-time before storage.

Signup and view all the flashcards

Consumer Applications

Applications that read and process records from Kinesis streams.

Signup and view all the flashcards

Buffer Sizing

The configuration of how much data accumulates in Firehose before flushing it out.

Signup and view all the flashcards

EMRFS

A system allowing S3 to be used like HDFS, supporting persistent storage after cluster termination.

Signup and view all the flashcards

Local file storage

Disk storage connected locally, ideal for temporary data like buffers and caches.

Signup and view all the flashcards

EBS for HDFS

EBS volumes are used with EMR and are deleted upon cluster termination; attached during cluster launch only.

Signup and view all the flashcards

Serverless

Automatically manages resources, determining node count and capacity for EMR jobs without manual intervention.

Signup and view all the flashcards

Capacity in Spark

Pre-initialized capacity needs to be at least 10% higher than the memory requested for drivers and executors.

Signup and view all the flashcards

Kinesis Data Streams

Service for streaming data, made of shards which are sequences of records ordered by arrival time.

Signup and view all the flashcards

Shard in Kinesis

A shard contains an ordered sequence of records and can be scaled by provisioning more shards beforehand.

Signup and view all the flashcards

On demand mode

Kinesis capacity mode that auto-scales based on observed throughput without pre-provisioning shards.

Signup and view all the flashcards

Resharding

The process of adding or merging shards in Kinesis to adjust stream capacity, which is not done in parallel.

Signup and view all the flashcards

Handling duplicates for producers

Producer retries can create duplicate data; fixing requires unique record IDs to prevent repeats on the consumer side.

Signup and view all the flashcards

Idempotent consumer

Consumer applications that can read the same data multiple times without causing adverse effects.

Signup and view all the flashcards

Kinesis Security

Access control via IAM, encryption in transit using HTTPS, and KMS for encryption at rest for Kinesis data.

Signup and view all the flashcards

Retention in Kinesis

Data in Kinesis can be retained for 1 to 365 days, programmable for reprocessing or replaying data later.

Signup and view all the flashcards

Data immutability

Once data is added to Kinesis, it cannot be deleted, ensuring ordered and reliable data flows.

Signup and view all the flashcards

Workgroups

Organized units for users, teams, and applications that control query access and track costs.

Signup and view all the flashcards

IAM Policies

Policies that define permissions for access control in AWS services.

Signup and view all the flashcards

AWS Glue Data Catalog

A centralized repository for storing metadata of data assets with fine-grained access.

Signup and view all the flashcards

Athena Notebook

An interactive environment for executing queries and collaborating on data analysis.

Signup and view all the flashcards

Spark Integration

Combines Spark's distributed computing with serverless features for analyzing large datasets.

Signup and view all the flashcards

Apache Spark

A distributed processing framework designed for big data analytics.

Signup and view all the flashcards

In-memory caching

Storing frequently accessed data in memory for faster retrieval.

Signup and view all the flashcards

Spark Streaming

Real-time data processing framework integrated with services like Kinesis.

Signup and view all the flashcards

MLlib

A library for machine learning algorithms capable of large-scale processing.

Signup and view all the flashcards

GraphX

Spark’s API for graph processing and analysis.

Signup and view all the flashcards

CREATE TABLE AS SELECT

SQL command to create a new table based on existing data through selection.

Signup and view all the flashcards

ETL Operations

Extract, Transform, Load - processes to prepare data for analysis.

Signup and view all the flashcards

Jupyter-style notebooks

Interactive computing environments for code writing, running, and visualization.

Signup and view all the flashcards

Version Control

A system that records changes to files and allows tracking revisions over time.

Signup and view all the flashcards

Query History

A record of previously run queries within a workgroup or environment.

Signup and view all the flashcards

Study Notes

Data Characteristics

  • Structured data is organized in a defined manner or schema, found in relational databases. Data is easily queryable and organized in rows and columns with consistent structure. Examples include database tables, CSV files, and Excel spreadsheets.
  • Unstructured data lacks a predefined structure or schema. It's not easily queryable without preprocessing and may come in various formats (e.g., text files without a fixed format, videos, audio files, images, emails, word documents).
  • Semi-structured data is less organized than structured data but has some structure, like tags, hierarchies, or other patterns. It's more flexible than structured but not as chaotic as unstructured (e.g., XML, JSON, email headers, log files with varied formats).
  • Key properties of data include:
    • Volume: Amount/size of data
    • Velocity: Speed at which new data is generated, collected, and processed
    • Variety: Different types, structure, and sources of data

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

ADHD Coaching: Strategies and Benefits
12 questions
Coaching Strategies and Goal Setting
8 questions
Use Quizgecko on...
Browser
Browser