Podcast
Questions and Answers
Which of the following describes a foreign key in relational databases?
Which of the following describes a foreign key in relational databases?
What is a primary key in a relational database?
What is a primary key in a relational database?
Which of the following is a benefit of data independence in relational databases?
Which of the following is a benefit of data independence in relational databases?
What is one of the main advantages of having a database administrator (DBA)?
What is one of the main advantages of having a database administrator (DBA)?
Signup and view all the answers
What does data integrity ensure in a relational database?
What does data integrity ensure in a relational database?
Signup and view all the answers
Which of the following is a disadvantage of relational databases?
Which of the following is a disadvantage of relational databases?
Signup and view all the answers
What component is essential for ensuring data security in a relational database?
What component is essential for ensuring data security in a relational database?
Signup and view all the answers
Which task is typically performed by the database server in a relational database?
Which task is typically performed by the database server in a relational database?
Signup and view all the answers
What does the query 'MATCH (s:Student)-[:FOLLOWING]->(c:Course) WHERE c.name = "MSc Software Development" RETURN s.name' retrieve?
What does the query 'MATCH (s:Student)-[:FOLLOWING]->(c:Course) WHERE c.name = "MSc Software Development" RETURN s.name' retrieve?
Signup and view all the answers
Which feature of MongoDB allows for flexible document structures with no enforced schema by default?
Which feature of MongoDB allows for flexible document structures with no enforced schema by default?
Signup and view all the answers
In the context of the CAP theorem, what does 'Consistency' ensure?
In the context of the CAP theorem, what does 'Consistency' ensure?
Signup and view all the answers
What is one key benefit of using graph databases in applications?
What is one key benefit of using graph databases in applications?
Signup and view all the answers
What language is MongoDB primarily interfaced with?
What language is MongoDB primarily interfaced with?
Signup and view all the answers
Which statement accurately describes MongoDB's document structure?
Which statement accurately describes MongoDB's document structure?
Signup and view all the answers
Which of the following best describes the role of the aggregation pipeline in MongoDB?
Which of the following best describes the role of the aggregation pipeline in MongoDB?
Signup and view all the answers
What is a notable difference between relational databases and NoSQL databases regarding data handling?
What is a notable difference between relational databases and NoSQL databases regarding data handling?
Signup and view all the answers
Which of the following statements about a relation (table) is correct?
Which of the following statements about a relation (table) is correct?
Signup and view all the answers
What best describes a schema in the context of relational databases?
What best describes a schema in the context of relational databases?
Signup and view all the answers
In a relational model, what is the cardinality?
In a relational model, what is the cardinality?
Signup and view all the answers
What is a foreign key used for in a relational database?
What is a foreign key used for in a relational database?
Signup and view all the answers
Which statement is true about attributes in a relation?
Which statement is true about attributes in a relation?
Signup and view all the answers
Which characteristic about relational databases is incorrect?
Which characteristic about relational databases is incorrect?
Signup and view all the answers
What does the term 'tuple' refer to in a relational database?
What does the term 'tuple' refer to in a relational database?
Signup and view all the answers
What is a potential application of data processing from mobile phone data?
What is a potential application of data processing from mobile phone data?
Signup and view all the answers
What are schema constraints primarily used for?
What are schema constraints primarily used for?
Signup and view all the answers
Which of the following describes a distributed database?
Which of the following describes a distributed database?
Signup and view all the answers
What challenge is associated with processing large volumes of data?
What challenge is associated with processing large volumes of data?
Signup and view all the answers
In terms of data storage, what is one advantage of using cloud solutions?
In terms of data storage, what is one advantage of using cloud solutions?
Signup and view all the answers
Which of the following is NOT a source of data mentioned?
Which of the following is NOT a source of data mentioned?
Signup and view all the answers
What is a key requirement for a robust database server?
What is a key requirement for a robust database server?
Signup and view all the answers
What does the term 'resilient to failure' refer to in data processing?
What does the term 'resilient to failure' refer to in data processing?
Signup and view all the answers
What is a characteristic of NoSQL databases compared to traditional SQL databases?
What is a characteristic of NoSQL databases compared to traditional SQL databases?
Signup and view all the answers
Which of the following data serialization formats is used internally by MongoDB?
Which of the following data serialization formats is used internally by MongoDB?
Signup and view all the answers
Which type of NoSQL database is optimized for queries across linked entities?
Which type of NoSQL database is optimized for queries across linked entities?
Signup and view all the answers
Which of the following formats is known for minimal markup in data formatting?
Which of the following formats is known for minimal markup in data formatting?
Signup and view all the answers
What advantage do NoSQL databases have concerning scalability?
What advantage do NoSQL databases have concerning scalability?
Signup and view all the answers
In NoSQL document stores, how is data typically organized?
In NoSQL document stores, how is data typically organized?
Signup and view all the answers
Which of the following is NOT a type of NoSQL database mentioned?
Which of the following is NOT a type of NoSQL database mentioned?
Signup and view all the answers
What is a key benefit of using object-oriented programming in NoSQL object stores?
What is a key benefit of using object-oriented programming in NoSQL object stores?
Signup and view all the answers
Study Notes
Data and Serialisation
- The module assumes basic Python knowledge.
- The cohort has varying abilities and experience.
- Students should contribute to class discussions and support each other, except during exams.
- Raise questions quickly, as confusing issues might also confuse others.
- Test knowledge via lab exercises and revision quizzes.
- Utilise discussion forums effectively.
- Resources include lectures, lab sessions, programming examples, and supporting reading material. Experimentation is needed to understand concepts.
- Assessment is individual and collaboration is prohibited.
- The exam date is yet to be determined.
- The exam is 50% of the grade for CS982 and 100% for CS988.
- Revision quizzes and previous exam papers should be reviewed to prepare for the exam.
- The exam will be a closed-book exam in the exam hall.
Data Processing
- Data sources include distributed sensors (wind turbines, power grid, rail network, infrastructure, white goods, mobile phones, CCTV cameras, internet documents, and social media uploads).
- Frequent sampling leads to large datasets.
- Possible applications include preempting maintenance, traffic modelling, finding criminals, face recognition, analysing behaviour, recommending products and services.
Challenges
- Large volumes of data.
- Many different data formats.
- Rapid analysis.
- Distributed data analysis.
- Processing bottlenecks.
- Processing data in parallel.
- Data resilience to failure.
Considerations
- Size of the data sample.
- Data type (sensitive or replaceable).
- Existing infrastructure (cluster or database server).
- Speed or performance requirements.
- Scalability.
Storage Approaches
- Database server: A single computer hosting a database and accepting connections. Can be overwhelmed by requests, but is robust.
- Distributed database: Parallel read and single or parallel write. Many servers or database files (e.g., SQLite). Distributed files containing serialized information. Minimises bottlenecks. Can be implemented in the cloud or on-premises.
Schema Constraints vs Evolution
- Schema describes data structure.
- Data may evolve or remain similar.
- Schema constraints are more or less helpful depending on the application's needs.
- Critical applications often need more constraints.
- Tolerating schema changes between data records is acceptable for some applications.
Serialisation Approaches
- Relational databases/tables: Data stored across one or more tables. Data must match table definitions.
-
Object-based storage:
- Uses JavaScript Object Notation (JSON).
- Allows custom binary serialisation.
- Can enforce or allow schema evolution
Relational Model
- Describes data as relations.
- Defines high-level operations, users do not need to know the implementation.
- No physical pointers.
- Different database servers may use different implementations (SQL).
Relational Databases
- Hold data in tables (relations).
- Example tables shown in figures.
- Relation (table): Rows (tuples) and columns (attributes).
- Cardinality, key, degree, and domain are aspects of relations (tables).
- Schema: Table names, attributes, and type definitions; database schema consists of several relations and constraint names applying to tables.
- Properties of a relation include: unique name, distinct tuples, order is irrelevant, and atomic values per field with a matching attribute type.
Keys
- Comprises one or more attributes used for identifying tuples (rows).
- Primary keys: Required to be unique, can be automatically incremented.
- Foreign keys: Refers to other tables, primary key value from other table, used to create constraints between tables.
Relational schema constraints
- Field type: Integer, floating point, text, etc.
- Field length: Limits maximum value or length of text.
- Unique values: Primary keys are unique and use auto-increment. Other unique values can use numbers or text.
- Foreign keys: Required values in other tables prevent deletion, can cascade deletion. Schema constraints restrict data values (e.g., bank accounts, personal information).
Relational databases benefits
- Data Independence: Not tightly coupled to underlying implementation, does not need the physical storage format. Applications access through SQL, same logic retrieving few or many fields.
- Data Security: Database server with user/password (TLS/SSL), permissions for access to tables/views.
- Integrity: Prevents duplicate rows and enforces constraints. Efficient for concurrent access where transactions do not interfere with each other, resources can be scaled to meet access needs.
- Analytics and Administration: Server logs actions, table/index size, operations, query times, and database size. Database administrator (DBA) controls design, usage, and performance tweaks.
Relational databases disadvantages
- The system is complex and needs maintenance.
- Server and the database can increase considerably in size.
- Hardware costs are needed for enhanced performance.
- Scaling to database schema/new schema is expensive.
- Avoid parallel write to a distributed database.
NoSQL Approaches
- Not only SQL (NoSQL).
- Uses different commands compared to SQL..
- Some inspiration from SQL (SQL - read, write and queries).
- Typically data is not stored in tables.
-
NoSQL Types:
- Document databases (JSON, BSON) - example MongoDB
- Key-value pair storage - example Couchbase
- Column-oriented databases - example Cassandra
- Graph databases - example Neo4j
NoSQL: Features
- Flexible schema definition—allows differing records/documents.
- Schema definition possible, but limited.
- No foreign keys.
- Easier to distribute the database and scale to address loading.
- Data structure similar to hash tables.
- Use of cheap storage instead of special servers
NoSQL Types: Document Stores
- Stored as key (text) and document; similar to key-value store.
- Documents encompass serialized data (JSON, XML, BSON).
- Other data serialization methods.
- Schema constraints possible within document schema.
- Index definition for collections.
NoSQL Types: Object Stores
- Key-value pair storage. Driven by object-oriented programming.
- Serialization and deserialization of objects to/from NoSQL database,.
- Serialized data stored as text.
- Intuitive access.
- Implemented in C++ or Java.
- ACID (atomicity, consistency, isolation, durability) compliance.
NoSQL Types: Graph Databases
- Each data unit (node/entity) has pointers (relationships) to other entities.
- . A graph of linked entities.
- Optimized querying across links, faster than equivalent SQL queries.
- Examples of fast link processing applications include shortest-path calculations and network/electricity analysis.
MongoDB
- MongoDB (humongous) is a document-based database using a JSON interface and BSON storage.
- Schema definition is possible
- Implemented in C++, JavaScript, and Python.
- Several server types, including cloud instances and community editions.
MongoDB: Features
- Flexible schema.
- Fast creation and updates.
- Server-side JavaScript execution.
- Flexible horizontal scaling—sharding.
- Indexing—for rapid searches.
- Aggregation using a pipeline similar to MapReduce.
CAP Theorem
- Choose a mix of properties based on the application.
- Relational databases: Ensure ACID compliance (Atomicity, Consistency, Isolation, Durability).
-
NoSQL: Trade-offs between:
- Consistency: Every read gets the latest write or error.
- Availability: Every request gets a response (not necessarily the latest write).
- Partition tolerance: System operates despite network message drops.
- Distributed systems can fail due to temporary network partitioning.
- Choose between consistency and availability.
Polyglot Persistence
- Use different data storage technologies together to leverage each's strengths.
- Example: Relational DB for sensitive data, NoSQL for changeable data.
- Use microservices per data type to combine in one application.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.