Untitled Quiz

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following describes a foreign key in relational databases?

  • It must refer to a value present in another table. (correct)
  • It is used to combine multiple tables into one.
  • It can automatically generate values for new entries.
  • It requires values to be unique within a table.

What is a primary key in a relational database?

  • A unique identifier for each record in a table. (correct)
  • A key that allows deletion of records in the same table.
  • A key that links to another table for data retrieval.
  • A field that can contain multiple values.

Which of the following is a benefit of data independence in relational databases?

  • It separates application logic from physical data organization. (correct)
  • It requires users to understand the database system.
  • It mandates that data is stored in a single table.
  • It allows direct access to the physical storage format.

What is one of the main advantages of having a database administrator (DBA)?

<p>They centralize control over database design and performance. (A)</p> Signup and view all the answers

What does data integrity ensure in a relational database?

<p>Duplicate rows are prevented in the database. (C)</p> Signup and view all the answers

Which of the following is a disadvantage of relational databases?

<p>They can become large and complex, requiring significant maintenance. (B)</p> Signup and view all the answers

What component is essential for ensuring data security in a relational database?

<p>User permissions and access controls. (D)</p> Signup and view all the answers

Which task is typically performed by the database server in a relational database?

<p>Logging user interactions and database actions. (D)</p> Signup and view all the answers

What does the query 'MATCH (s:Student)-[:FOLLOWING]->(c:Course) WHERE c.name = "MSc Software Development" RETURN s.name' retrieve?

<p>The names of students enrolled in a specific course (C)</p> Signup and view all the answers

Which feature of MongoDB allows for flexible document structures with no enforced schema by default?

<p>Document flexibility (A)</p> Signup and view all the answers

In the context of the CAP theorem, what does 'Consistency' ensure?

<p>Every read receives the most recent write or an error. (B)</p> Signup and view all the answers

What is one key benefit of using graph databases in applications?

<p>Ability to easily perform complex queries about relationships (B)</p> Signup and view all the answers

What language is MongoDB primarily interfaced with?

<p>JavaScript (C)</p> Signup and view all the answers

Which statement accurately describes MongoDB's document structure?

<p>Documents can vary in structure but are usually key-value pairs. (A)</p> Signup and view all the answers

Which of the following best describes the role of the aggregation pipeline in MongoDB?

<p>A process similar to MapReduce for processing and transforming data. (C)</p> Signup and view all the answers

What is a notable difference between relational databases and NoSQL databases regarding data handling?

<p>NoSQL databases can quickly adapt to changes in data structure. (A)</p> Signup and view all the answers

Which of the following statements about a relation (table) is correct?

<p>Each tuple in a relation must be distinct. (C)</p> Signup and view all the answers

What best describes a schema in the context of relational databases?

<p>A set of rules defining how data is stored and organized (C)</p> Signup and view all the answers

In a relational model, what is the cardinality?

<p>The number of tuples (rows) in a relation (D)</p> Signup and view all the answers

What is a foreign key used for in a relational database?

<p>To refer to primary key values from other tables (C)</p> Signup and view all the answers

Which statement is true about attributes in a relation?

<p>Each attribute must have a distinct name within a table (C)</p> Signup and view all the answers

Which characteristic about relational databases is incorrect?

<p>They allow the order of tuples to affect data retrieval. (D)</p> Signup and view all the answers

What does the term 'tuple' refer to in a relational database?

<p>A single row of data in a table (A)</p> Signup and view all the answers

What is a potential application of data processing from mobile phone data?

<p>Finding criminals (A)</p> Signup and view all the answers

What are schema constraints primarily used for?

<p>To enhance critical application reliability (D)</p> Signup and view all the answers

Which of the following describes a distributed database?

<p>A system with multiple servers allowing for parallel reads (B)</p> Signup and view all the answers

What challenge is associated with processing large volumes of data?

<p>Processing bottlenecks (C)</p> Signup and view all the answers

In terms of data storage, what is one advantage of using cloud solutions?

<p>Enhanced disaster recovery capabilities (B)</p> Signup and view all the answers

Which of the following is NOT a source of data mentioned?

<p>Statistical analysis results (A)</p> Signup and view all the answers

What is a key requirement for a robust database server?

<p>Ability to handle simultaneous requests (B)</p> Signup and view all the answers

What does the term 'resilient to failure' refer to in data processing?

<p>The capacity to recover quickly from failures (C)</p> Signup and view all the answers

What is a characteristic of NoSQL databases compared to traditional SQL databases?

<p>They use a flexible schema definition. (D)</p> Signup and view all the answers

Which of the following data serialization formats is used internally by MongoDB?

<p>BSON (D)</p> Signup and view all the answers

Which type of NoSQL database is optimized for queries across linked entities?

<p>Graph database (A)</p> Signup and view all the answers

Which of the following formats is known for minimal markup in data formatting?

<p>YAML (C)</p> Signup and view all the answers

What advantage do NoSQL databases have concerning scalability?

<p>Easier distribution of database. (B)</p> Signup and view all the answers

In NoSQL document stores, how is data typically organized?

<p>As serialized documents. (A)</p> Signup and view all the answers

Which of the following is NOT a type of NoSQL database mentioned?

<p>Row-oriented database (A)</p> Signup and view all the answers

What is a key benefit of using object-oriented programming in NoSQL object stores?

<p>They serialize and deserialize objects automatically. (D)</p> Signup and view all the answers

Flashcards

Relational Databases

Organize data in tables with rows and columns, enforcing data consistency through schema definitions.

Relation (Table)

Two-dimensional structure (rows and columns) with attributes (columns), tuples (rows), and a degree (number of columns).

Schema (of a relation)

Description of the relation's attributes, data types, and constraints.

Primary Key

Unique identifier for each row within a table, ensuring uniqueness of records.

Signup and view all the flashcards

Foreign Key

A column in one table that refers to the primary key of another table, establishing relationships between tables.

Signup and view all the flashcards

Attribute

A named characteristic (column) of the data in a relation (table).

Signup and view all the flashcards

Tuple (Row)

A single row in a relation (table) containing the data for a specific entity.

Signup and view all the flashcards

SQL

Structured Query Language, used to query and manipulate data in relational databases.

Signup and view all the flashcards

Data Sources

Various places where data originates, including distributed sensors, mobile phones, social media, and the internet.

Signup and view all the flashcards

Data Storage Approaches

Methods for storing data, including database servers, distributed databases, and distributed files.

Signup and view all the flashcards

Database Server

A single computer hosting a database, handling connections.

Signup and view all the flashcards

Distributed Database

A database spread across multiple servers, enabling parallel processing.

Signup and view all the flashcards

Distributed Files

Data stored across many files or servers, often serialized.

Signup and view all the flashcards

Data Serialization

The process of converting data into a format suitable for storage or transmission.

Signup and view all the flashcards

Schema Constraints

Rules that define the structure of data within a database, often helpful when data reliability is significant.

Signup and view all the flashcards

Data Volume Challenges

The difficulty of analyzing large amounts of data quickly and reliably, especially from various formats and across locations.

Signup and view all the flashcards

Relational Database Constraints

Rules that restrict data values in a relational database, ensuring data integrity and consistency. They specify field types, lengths, uniqueness, primary and foreign keys.

Signup and view all the flashcards

Data Independence

The ability to change the physical storage format of data without affecting applications that access it.

Signup and view all the flashcards

Data Integrity

Ensuring accuracy and consistency of data in a database by preventing duplicate rows and enforcing constraints.

Signup and view all the flashcards

Database Security

Protecting a database from unauthorized access through user authentication, permission restrictions and data encryption.

Signup and view all the flashcards

Database Administrator (DBA)

A role responsible for managing the database, including design, maintenance, performance tuning, and security.

Signup and view all the flashcards

Document Database

A NoSQL database that stores data as documents, typically in JSON or BSON format. Each document can have its own structure, making it flexible for storing complex data.

Signup and view all the flashcards

Key-Value Pair Storage

A NoSQL database that stores data as key-value pairs. It's simple and fast for data retrieval and good for storing website settings and user preferences.

Signup and view all the flashcards

Column-Oriented Database

A NoSQL database that stores data in columns instead of rows for efficient querying of specific attributes. This is useful for analyzing large datasets with many columns.

Signup and view all the flashcards

Graph Database

A NoSQL database that represents data as nodes (entities) and edges (relationships). This structure is ideal for storing and querying complex network-like data, such as social networks or geographic data.

Signup and view all the flashcards

Object Store

A NoSQL database that stores data as objects, serializing them into text formats like JSON or BSON. It offers advantages like ACID compliance for data consistency and easy object-oriented programming integration.

Signup and view all the flashcards

MongoDB

A document-oriented NoSQL database that stores data in JSON-like documents with key-value pairs. It offers flexible schema, fast data access, and scalability.

Signup and view all the flashcards

Document (MongoDB)

A unit of data in MongoDB, similar to a row in a relational database, but with flexible structure and unstructured data using key-value pairs.

Signup and view all the flashcards

BSON Schema

Optional structure definition for documents in MongoDB, providing guidelines for data types and validation rules, making it easier to manage data consistency.

Signup and view all the flashcards

CAP Theorem

A fundamental principle in distributed systems that states it's impossible to guarantee consistency, availability, and partition tolerance simultaneously. You must choose two out of three.

Signup and view all the flashcards

Consistency (CAP)

All reads get the most recent data, or an error is returned if consistency cannot be guaranteed.

Signup and view all the flashcards

Availability (CAP)

All requests get a response, even if it may not reflect the most recent changes due to network issues or data inconsistencies.

Signup and view all the flashcards

Partition Tolerance (CAP)

The system can operate correctly even if parts of the network are disconnected, ensuring data remains accessible even with network disruptions.

Signup and view all the flashcards

Study Notes

Data and Serialisation

  • The module assumes basic Python knowledge.
  • The cohort has varying abilities and experience.
  • Students should contribute to class discussions and support each other, except during exams.
  • Raise questions quickly, as confusing issues might also confuse others.
  • Test knowledge via lab exercises and revision quizzes.
  • Utilise discussion forums effectively.
  • Resources include lectures, lab sessions, programming examples, and supporting reading material. Experimentation is needed to understand concepts.
  • Assessment is individual and collaboration is prohibited.
  • The exam date is yet to be determined.
  • The exam is 50% of the grade for CS982 and 100% for CS988.
  • Revision quizzes and previous exam papers should be reviewed to prepare for the exam.
  • The exam will be a closed-book exam in the exam hall.

Data Processing

  • Data sources include distributed sensors (wind turbines, power grid, rail network, infrastructure, white goods, mobile phones, CCTV cameras, internet documents, and social media uploads).
  • Frequent sampling leads to large datasets.
  • Possible applications include preempting maintenance, traffic modelling, finding criminals, face recognition, analysing behaviour, recommending products and services.

Challenges

  • Large volumes of data.
  • Many different data formats.
  • Rapid analysis.
  • Distributed data analysis.
  • Processing bottlenecks.
  • Processing data in parallel.
  • Data resilience to failure.

Considerations

  • Size of the data sample.
  • Data type (sensitive or replaceable).
  • Existing infrastructure (cluster or database server).
  • Speed or performance requirements.
  • Scalability.

Storage Approaches

  • Database server: A single computer hosting a database and accepting connections. Can be overwhelmed by requests, but is robust.
  • Distributed database: Parallel read and single or parallel write. Many servers or database files (e.g., SQLite). Distributed files containing serialized information. Minimises bottlenecks. Can be implemented in the cloud or on-premises.

Schema Constraints vs Evolution

  • Schema describes data structure.
  • Data may evolve or remain similar.
  • Schema constraints are more or less helpful depending on the application's needs.
  • Critical applications often need more constraints.
  • Tolerating schema changes between data records is acceptable for some applications.

Serialisation Approaches

  • Relational databases/tables: Data stored across one or more tables. Data must match table definitions.
  • Object-based storage:
    • Uses JavaScript Object Notation (JSON).
    • Allows custom binary serialisation.
    • Can enforce or allow schema evolution

Relational Model

  • Describes data as relations.
  • Defines high-level operations, users do not need to know the implementation.
  • No physical pointers.
  • Different database servers may use different implementations (SQL).

Relational Databases

  • Hold data in tables (relations).
  • Example tables shown in figures.
  • Relation (table): Rows (tuples) and columns (attributes).
  • Cardinality, key, degree, and domain are aspects of relations (tables).
  • Schema: Table names, attributes, and type definitions; database schema consists of several relations and constraint names applying to tables.
  • Properties of a relation include: unique name, distinct tuples, order is irrelevant, and atomic values per field with a matching attribute type.

Keys

  • Comprises one or more attributes used for identifying tuples (rows).
  • Primary keys: Required to be unique, can be automatically incremented.
  • Foreign keys: Refers to other tables, primary key value from other table, used to create constraints between tables.

Relational schema constraints

  • Field type: Integer, floating point, text, etc.
  • Field length: Limits maximum value or length of text.
  • Unique values: Primary keys are unique and use auto-increment. Other unique values can use numbers or text.
  • Foreign keys: Required values in other tables prevent deletion, can cascade deletion. Schema constraints restrict data values (e.g., bank accounts, personal information).

Relational databases benefits

  • Data Independence: Not tightly coupled to underlying implementation, does not need the physical storage format. Applications access through SQL, same logic retrieving few or many fields.
  • Data Security: Database server with user/password (TLS/SSL), permissions for access to tables/views.
  • Integrity: Prevents duplicate rows and enforces constraints. Efficient for concurrent access where transactions do not interfere with each other, resources can be scaled to meet access needs.
  • Analytics and Administration: Server logs actions, table/index size, operations, query times, and database size. Database administrator (DBA) controls design, usage, and performance tweaks.

Relational databases disadvantages

  • The system is complex and needs maintenance.
  • Server and the database can increase considerably in size.
  • Hardware costs are needed for enhanced performance.
  • Scaling to database schema/new schema is expensive.
  • Avoid parallel write to a distributed database.

NoSQL Approaches

  • Not only SQL (NoSQL).
  • Uses different commands compared to SQL..
  • Some inspiration from SQL (SQL - read, write and queries).
  • Typically data is not stored in tables.
  • NoSQL Types:
    • Document databases (JSON, BSON) - example MongoDB
    • Key-value pair storage - example Couchbase
    • Column-oriented databases - example Cassandra
    • Graph databases - example Neo4j

NoSQL: Features

  • Flexible schema definition—allows differing records/documents.
  • Schema definition possible, but limited.
  • No foreign keys.
  • Easier to distribute the database and scale to address loading.
  • Data structure similar to hash tables.
  • Use of cheap storage instead of special servers

NoSQL Types: Document Stores

  • Stored as key (text) and document; similar to key-value store.
  • Documents encompass serialized data (JSON, XML, BSON).
  • Other data serialization methods.
  • Schema constraints possible within document schema.
  • Index definition for collections.

NoSQL Types: Object Stores

  • Key-value pair storage. Driven by object-oriented programming.
  • Serialization and deserialization of objects to/from NoSQL database,.
  • Serialized data stored as text.
  • Intuitive access.
  • Implemented in C++ or Java.
  • ACID (atomicity, consistency, isolation, durability) compliance.

NoSQL Types: Graph Databases

  • Each data unit (node/entity) has pointers (relationships) to other entities.
  • . A graph of linked entities.
  • Optimized querying across links, faster than equivalent SQL queries.
  • Examples of fast link processing applications include shortest-path calculations and network/electricity analysis.

MongoDB

  • MongoDB (humongous) is a document-based database using a JSON interface and BSON storage.
  • Schema definition is possible
  • Implemented in C++, JavaScript, and Python.
  • Several server types, including cloud instances and community editions.

MongoDB: Features

  • Flexible schema.
  • Fast creation and updates.
  • Server-side JavaScript execution.
  • Flexible horizontal scaling—sharding.
  • Indexing—for rapid searches.
  • Aggregation using a pipeline similar to MapReduce.

CAP Theorem

  • Choose a mix of properties based on the application.
  • Relational databases: Ensure ACID compliance (Atomicity, Consistency, Isolation, Durability).
  • NoSQL: Trade-offs between:
    • Consistency: Every read gets the latest write or error.
    • Availability: Every request gets a response (not necessarily the latest write).
    • Partition tolerance: System operates despite network message drops.
  • Distributed systems can fail due to temporary network partitioning.
  • Choose between consistency and availability.

Polyglot Persistence

  • Use different data storage technologies together to leverage each's strengths.
  • Example: Relational DB for sensitive data, NoSQL for changeable data.
  • Use microservices per data type to combine in one application.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Untitled Quiz
6 questions

Untitled Quiz

AdoredHealing avatar
AdoredHealing
Untitled Quiz
18 questions

Untitled Quiz

RighteousIguana avatar
RighteousIguana
Untitled Quiz
50 questions

Untitled Quiz

JoyousSulfur avatar
JoyousSulfur
Untitled Quiz
48 questions

Untitled Quiz

StraightforwardStatueOfLiberty avatar
StraightforwardStatueOfLiberty
Use Quizgecko on...
Browser
Browser