Data Modeling IT315 - PDF

DATA MODELING IT315 - IT ELECTIVE 1 Objectives 1 Define data model 4 Use basic CQL 2 Define a keyspace 3 Define table and column INTRODUCTION Data Modeling is a crucial part of database design. Traditional relational databases use normalized tables with foreign keys and support table joins. In Cassandra, data modeling is query-driven, designed for specific data access patterns. Cassandra Query- centered approach Cassandra is not a traditional relational database. Data modeling is based on data access patterns and application queries. Queries are structured to visit a single table for fast data access. Denormalization and data duplication are key for performance. CASSANDRA DATA MODEL COMPONENTS Keyspace Table Column Keyspaces are like schemas in Columns in Cassandra tables are traditional databases. defined based on data requirements. Components They serve as data containers. Various data types are available, Replication is configured at the including text, float, double, etc. keyspace level. Special columns like Counter are used Each table belongs to a keyspace. for specific purposes. Keyspace Column Tables in Cassandra are created within keyspaces. Denormalization is essential; each table should support specific queries. Data duplication ensures high read performance. Tables have a primary key for unique row identification. Table CASSANDRA CLUSTER Cassandra operates in a cluster of interconnected machines. Each cluster has multiple nodes for fault tolerance. Data is distributed and arranged in a ring pattern. CASSANDRA CLUSTER PRIMARY KEY In Cassandra, a row is uniquely identified by its primary key. The primary key includes a partition key and optional clustering columns. Partition key determines data distribution. Clustering columns affect data arrangement within a partition. PRIMARY KEY PARTITION KEY One or more columns make up the partition key component of the primary key. In order to quickly find a partition inside the cluster, Cassandra concatenates all values from the partition key columns (Data Modeling, n.d.). Cassandra divides incoming data into discrete parts and distributes them among cluster nodes by hashing a data property known as the partition key. PARTITION KEY PARTITION A set of rows (a relatively small subset of the table) that share the same partition key is referred to as a partition. Since a partition represents a physical unit of access, Cassandra will swiftly fetch all of the rows in a partition at once. Partitions can be viewed as the outcomes of previously performed queries. PARTITION CLUSTERING COLUMN Cassandra arranges the rows within a partition based on the values of the clustering columns. As a result, Cassandra can use the values of the clustering column during a query to swiftly search the partition for a certain row within the partition. CLUSTERING COLUMNS RULES 1. Maximize Writes: Cassandra is optimized for fast writes; maximize them for better read performance. 2. Maximize Data Duplications: Denormalization and data duplication ensure high availability. 3. Spread Data Evenly: Distribute data evenly among cluster nodes for balanced performance. 4. Minimize Partitions Read: Fewer partitions read mean faster queries. 5. Create Tables Based on Queries: Design tables based on the queries they need to support. One-to-one (1:1) One-to-many (1:M) RELATIONSHIPS Many-to-many (M:N) ONE-TO-ONE (1:) RELATIONSHIP ONE-TO-ONE (1:) RELATIONSHIP ONE-TO-MANY (1:M) RELATIONSHIP ONE-TO-MANY (1:M) RELATIONSHIP MANY-TO-MANY (M:N) RELATIONSHIP Find all the courses by a particular student MANY-TO-MANY (M:N) RELATIONSHIP Find the number of students studying in a particular course.

Data Modeling IT315 - PDF

Document Details

Tags

Related

Summary

Full Transcript