Podcast
Questions and Answers
Which of the following describes a foreign key in relational databases?
Which of the following describes a foreign key in relational databases?
- It must refer to a value present in another table. (correct)
- It is used to combine multiple tables into one.
- It can automatically generate values for new entries.
- It requires values to be unique within a table.
What is a primary key in a relational database?
What is a primary key in a relational database?
- A unique identifier for each record in a table. (correct)
- A key that allows deletion of records in the same table.
- A key that links to another table for data retrieval.
- A field that can contain multiple values.
Which of the following is a benefit of data independence in relational databases?
Which of the following is a benefit of data independence in relational databases?
- It separates application logic from physical data organization. (correct)
- It requires users to understand the database system.
- It mandates that data is stored in a single table.
- It allows direct access to the physical storage format.
What is one of the main advantages of having a database administrator (DBA)?
What is one of the main advantages of having a database administrator (DBA)?
What does data integrity ensure in a relational database?
What does data integrity ensure in a relational database?
Which of the following is a disadvantage of relational databases?
Which of the following is a disadvantage of relational databases?
What component is essential for ensuring data security in a relational database?
What component is essential for ensuring data security in a relational database?
Which task is typically performed by the database server in a relational database?
Which task is typically performed by the database server in a relational database?
What does the query 'MATCH (s:Student)-[:FOLLOWING]->(c:Course) WHERE c.name = "MSc Software Development" RETURN s.name' retrieve?
What does the query 'MATCH (s:Student)-[:FOLLOWING]->(c:Course) WHERE c.name = "MSc Software Development" RETURN s.name' retrieve?
Which feature of MongoDB allows for flexible document structures with no enforced schema by default?
Which feature of MongoDB allows for flexible document structures with no enforced schema by default?
In the context of the CAP theorem, what does 'Consistency' ensure?
In the context of the CAP theorem, what does 'Consistency' ensure?
What is one key benefit of using graph databases in applications?
What is one key benefit of using graph databases in applications?
What language is MongoDB primarily interfaced with?
What language is MongoDB primarily interfaced with?
Which statement accurately describes MongoDB's document structure?
Which statement accurately describes MongoDB's document structure?
Which of the following best describes the role of the aggregation pipeline in MongoDB?
Which of the following best describes the role of the aggregation pipeline in MongoDB?
What is a notable difference between relational databases and NoSQL databases regarding data handling?
What is a notable difference between relational databases and NoSQL databases regarding data handling?
Which of the following statements about a relation (table) is correct?
Which of the following statements about a relation (table) is correct?
What best describes a schema in the context of relational databases?
What best describes a schema in the context of relational databases?
In a relational model, what is the cardinality?
In a relational model, what is the cardinality?
What is a foreign key used for in a relational database?
What is a foreign key used for in a relational database?
Which statement is true about attributes in a relation?
Which statement is true about attributes in a relation?
Which characteristic about relational databases is incorrect?
Which characteristic about relational databases is incorrect?
What does the term 'tuple' refer to in a relational database?
What does the term 'tuple' refer to in a relational database?
What is a potential application of data processing from mobile phone data?
What is a potential application of data processing from mobile phone data?
What are schema constraints primarily used for?
What are schema constraints primarily used for?
Which of the following describes a distributed database?
Which of the following describes a distributed database?
What challenge is associated with processing large volumes of data?
What challenge is associated with processing large volumes of data?
In terms of data storage, what is one advantage of using cloud solutions?
In terms of data storage, what is one advantage of using cloud solutions?
Which of the following is NOT a source of data mentioned?
Which of the following is NOT a source of data mentioned?
What is a key requirement for a robust database server?
What is a key requirement for a robust database server?
What does the term 'resilient to failure' refer to in data processing?
What does the term 'resilient to failure' refer to in data processing?
What is a characteristic of NoSQL databases compared to traditional SQL databases?
What is a characteristic of NoSQL databases compared to traditional SQL databases?
Which of the following data serialization formats is used internally by MongoDB?
Which of the following data serialization formats is used internally by MongoDB?
Which type of NoSQL database is optimized for queries across linked entities?
Which type of NoSQL database is optimized for queries across linked entities?
Which of the following formats is known for minimal markup in data formatting?
Which of the following formats is known for minimal markup in data formatting?
What advantage do NoSQL databases have concerning scalability?
What advantage do NoSQL databases have concerning scalability?
In NoSQL document stores, how is data typically organized?
In NoSQL document stores, how is data typically organized?
Which of the following is NOT a type of NoSQL database mentioned?
Which of the following is NOT a type of NoSQL database mentioned?
What is a key benefit of using object-oriented programming in NoSQL object stores?
What is a key benefit of using object-oriented programming in NoSQL object stores?
Flashcards
Relational Databases
Relational Databases
Organize data in tables with rows and columns, enforcing data consistency through schema definitions.
Relation (Table)
Relation (Table)
Two-dimensional structure (rows and columns) with attributes (columns), tuples (rows), and a degree (number of columns).
Schema (of a relation)
Schema (of a relation)
Description of the relation's attributes, data types, and constraints.
Primary Key
Primary Key
Signup and view all the flashcards
Foreign Key
Foreign Key
Signup and view all the flashcards
Attribute
Attribute
Signup and view all the flashcards
Tuple (Row)
Tuple (Row)
Signup and view all the flashcards
SQL
SQL
Signup and view all the flashcards
Data Sources
Data Sources
Signup and view all the flashcards
Data Storage Approaches
Data Storage Approaches
Signup and view all the flashcards
Database Server
Database Server
Signup and view all the flashcards
Distributed Database
Distributed Database
Signup and view all the flashcards
Distributed Files
Distributed Files
Signup and view all the flashcards
Data Serialization
Data Serialization
Signup and view all the flashcards
Schema Constraints
Schema Constraints
Signup and view all the flashcards
Data Volume Challenges
Data Volume Challenges
Signup and view all the flashcards
Relational Database Constraints
Relational Database Constraints
Signup and view all the flashcards
Data Independence
Data Independence
Signup and view all the flashcards
Data Integrity
Data Integrity
Signup and view all the flashcards
Database Security
Database Security
Signup and view all the flashcards
Database Administrator (DBA)
Database Administrator (DBA)
Signup and view all the flashcards
Document Database
Document Database
Signup and view all the flashcards
Key-Value Pair Storage
Key-Value Pair Storage
Signup and view all the flashcards
Column-Oriented Database
Column-Oriented Database
Signup and view all the flashcards
Graph Database
Graph Database
Signup and view all the flashcards
Object Store
Object Store
Signup and view all the flashcards
MongoDB
MongoDB
Signup and view all the flashcards
Document (MongoDB)
Document (MongoDB)
Signup and view all the flashcards
BSON Schema
BSON Schema
Signup and view all the flashcards
CAP Theorem
CAP Theorem
Signup and view all the flashcards
Consistency (CAP)
Consistency (CAP)
Signup and view all the flashcards
Availability (CAP)
Availability (CAP)
Signup and view all the flashcards
Partition Tolerance (CAP)
Partition Tolerance (CAP)
Signup and view all the flashcards
Study Notes
Data and Serialisation
- The module assumes basic Python knowledge.
- The cohort has varying abilities and experience.
- Students should contribute to class discussions and support each other, except during exams.
- Raise questions quickly, as confusing issues might also confuse others.
- Test knowledge via lab exercises and revision quizzes.
- Utilise discussion forums effectively.
- Resources include lectures, lab sessions, programming examples, and supporting reading material. Experimentation is needed to understand concepts.
- Assessment is individual and collaboration is prohibited.
- The exam date is yet to be determined.
- The exam is 50% of the grade for CS982 and 100% for CS988.
- Revision quizzes and previous exam papers should be reviewed to prepare for the exam.
- The exam will be a closed-book exam in the exam hall.
Data Processing
- Data sources include distributed sensors (wind turbines, power grid, rail network, infrastructure, white goods, mobile phones, CCTV cameras, internet documents, and social media uploads).
- Frequent sampling leads to large datasets.
- Possible applications include preempting maintenance, traffic modelling, finding criminals, face recognition, analysing behaviour, recommending products and services.
Challenges
- Large volumes of data.
- Many different data formats.
- Rapid analysis.
- Distributed data analysis.
- Processing bottlenecks.
- Processing data in parallel.
- Data resilience to failure.
Considerations
- Size of the data sample.
- Data type (sensitive or replaceable).
- Existing infrastructure (cluster or database server).
- Speed or performance requirements.
- Scalability.
Storage Approaches
- Database server: A single computer hosting a database and accepting connections. Can be overwhelmed by requests, but is robust.
- Distributed database: Parallel read and single or parallel write. Many servers or database files (e.g., SQLite). Distributed files containing serialized information. Minimises bottlenecks. Can be implemented in the cloud or on-premises.
Schema Constraints vs Evolution
- Schema describes data structure.
- Data may evolve or remain similar.
- Schema constraints are more or less helpful depending on the application's needs.
- Critical applications often need more constraints.
- Tolerating schema changes between data records is acceptable for some applications.
Serialisation Approaches
- Relational databases/tables: Data stored across one or more tables. Data must match table definitions.
- Object-based storage:
- Uses JavaScript Object Notation (JSON).
- Allows custom binary serialisation.
- Can enforce or allow schema evolution
Relational Model
- Describes data as relations.
- Defines high-level operations, users do not need to know the implementation.
- No physical pointers.
- Different database servers may use different implementations (SQL).
Relational Databases
- Hold data in tables (relations).
- Example tables shown in figures.
- Relation (table): Rows (tuples) and columns (attributes).
- Cardinality, key, degree, and domain are aspects of relations (tables).
- Schema: Table names, attributes, and type definitions; database schema consists of several relations and constraint names applying to tables.
- Properties of a relation include: unique name, distinct tuples, order is irrelevant, and atomic values per field with a matching attribute type.
Keys
- Comprises one or more attributes used for identifying tuples (rows).
- Primary keys: Required to be unique, can be automatically incremented.
- Foreign keys: Refers to other tables, primary key value from other table, used to create constraints between tables.
Relational schema constraints
- Field type: Integer, floating point, text, etc.
- Field length: Limits maximum value or length of text.
- Unique values: Primary keys are unique and use auto-increment. Other unique values can use numbers or text.
- Foreign keys: Required values in other tables prevent deletion, can cascade deletion. Schema constraints restrict data values (e.g., bank accounts, personal information).
Relational databases benefits
- Data Independence: Not tightly coupled to underlying implementation, does not need the physical storage format. Applications access through SQL, same logic retrieving few or many fields.
- Data Security: Database server with user/password (TLS/SSL), permissions for access to tables/views.
- Integrity: Prevents duplicate rows and enforces constraints. Efficient for concurrent access where transactions do not interfere with each other, resources can be scaled to meet access needs.
- Analytics and Administration: Server logs actions, table/index size, operations, query times, and database size. Database administrator (DBA) controls design, usage, and performance tweaks.
Relational databases disadvantages
- The system is complex and needs maintenance.
- Server and the database can increase considerably in size.
- Hardware costs are needed for enhanced performance.
- Scaling to database schema/new schema is expensive.
- Avoid parallel write to a distributed database.
NoSQL Approaches
- Not only SQL (NoSQL).
- Uses different commands compared to SQL..
- Some inspiration from SQL (SQL - read, write and queries).
- Typically data is not stored in tables.
- NoSQL Types:
- Document databases (JSON, BSON) - example MongoDB
- Key-value pair storage - example Couchbase
- Column-oriented databases - example Cassandra
- Graph databases - example Neo4j
NoSQL: Features
- Flexible schema definition—allows differing records/documents.
- Schema definition possible, but limited.
- No foreign keys.
- Easier to distribute the database and scale to address loading.
- Data structure similar to hash tables.
- Use of cheap storage instead of special servers
NoSQL Types: Document Stores
- Stored as key (text) and document; similar to key-value store.
- Documents encompass serialized data (JSON, XML, BSON).
- Other data serialization methods.
- Schema constraints possible within document schema.
- Index definition for collections.
NoSQL Types: Object Stores
- Key-value pair storage. Driven by object-oriented programming.
- Serialization and deserialization of objects to/from NoSQL database,.
- Serialized data stored as text.
- Intuitive access.
- Implemented in C++ or Java.
- ACID (atomicity, consistency, isolation, durability) compliance.
NoSQL Types: Graph Databases
- Each data unit (node/entity) has pointers (relationships) to other entities.
- . A graph of linked entities.
- Optimized querying across links, faster than equivalent SQL queries.
- Examples of fast link processing applications include shortest-path calculations and network/electricity analysis.
MongoDB
- MongoDB (humongous) is a document-based database using a JSON interface and BSON storage.
- Schema definition is possible
- Implemented in C++, JavaScript, and Python.
- Several server types, including cloud instances and community editions.
MongoDB: Features
- Flexible schema.
- Fast creation and updates.
- Server-side JavaScript execution.
- Flexible horizontal scaling—sharding.
- Indexing—for rapid searches.
- Aggregation using a pipeline similar to MapReduce.
CAP Theorem
- Choose a mix of properties based on the application.
- Relational databases: Ensure ACID compliance (Atomicity, Consistency, Isolation, Durability).
- NoSQL: Trade-offs between:
- Consistency: Every read gets the latest write or error.
- Availability: Every request gets a response (not necessarily the latest write).
- Partition tolerance: System operates despite network message drops.
- Distributed systems can fail due to temporary network partitioning.
- Choose between consistency and availability.
Polyglot Persistence
- Use different data storage technologies together to leverage each's strengths.
- Example: Relational DB for sensitive data, NoSQL for changeable data.
- Use microservices per data type to combine in one application.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.