Big Data Analytics Fundamentals

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which characteristic is NOT typically associated with structured data?

Defined in fixed fields.
Conforms to a data model.
Variable schema. (correct)
Organized in rows and columns.

Which statement accurately describes semi-structured data?

Contains tags or metadata to define hierarchy. (correct)
Easily queried using SQL.
Cannot be stored in rows and columns. (correct)
Strictly adheres to a relational database model.

Which is a characteristic of unstructured data?

Easy integration with relational databases.
Well-defined schema.
Limited data analysis possibilities.
Non-relational model without a specific schema. (correct)

Which storage solution is most suitable for unstructured data requiring high scalability and availability?

Distributed file system with data sharding and replication. (A) Signup and view all the answers

How does a distributed file system (DFS) enhance data accessibility for programmers?

By allowing file access from any network or computer. (A) Signup and view all the answers

Which of the following is a main benefit of using a Distributed File System (DFS)?

Automatic data backup and recovery, ensuring data reliability. (A) Signup and view all the answers

With which type of Distributed File System (DFS) are Google FS and Hadoop FS associated?

Cloud File System. (A) Signup and view all the answers

What inspired the creation of cloud file systems like Hadoop, Grid, Amazon, and Azure file systems?

The Google File System (GFS). (A) Signup and view all the answers

What is the average file size optimized for in the Google File System (GFS)?

Gigabytes. (C) Signup and view all the answers

In Google File System (GFS), what is the size of each chunk into which files are divided?

64 MB (B) Signup and view all the answers

What happens to updates in metadata within the Google File System (GFS) architecture?

They are stored in a log at the master storage. (A) Signup and view all the answers

What is the role of chunk-servers in the Google File System (GFS)?

To store the actual file data in chunks. (C) Signup and view all the answers

What happens if a chunk-server in Google File System (GFS) fails to send a heartbeat signal to the master server?

The master server marks the chunk-server as potentially unavailable. (A) Signup and view all the answers

Which of the following describes the 'consistent' state in the Google File System (GFS) consistency model?

All redundant chunks contain the same data after a write. (B) Signup and view all the answers

In Google File System (GFS), what is the role of the primary replica during a write operation?

To determine the order of write operations and ensure consistency across all secondary replicas. (A) Signup and view all the answers

What happens in Google File System (GFS) if a chunk misses a serial number during write operations?

The chunk is marked as an orphan, and the master completes the missing data in the next heartbeat. (A) Signup and view all the answers

How does distributed computing make a network of computers appear to end-users?

As a single, powerful computer. (A) Signup and view all the answers

Which of the following is a key advantage of distributed computing?

Improved Scalability. (D) Signup and view all the answers

What characterizes the client-server architecture in distributed computing?

Servers provide services and manage their own databases. (A) Signup and view all the answers

What is a key limitation of client-server architecture in distributed computing?

Server bottlenecks under heavy request loads. (D) Signup and view all the answers

In a three-tier architecture, what is the primary responsibility of the database tier?

Retrieving and ensuring the consistency of data. (C) Signup and view all the answers

What characterizes N-tier architecture in distributed systems?

Client-server systems communicating to solve a problem. (A) Signup and view all the answers

What is a defining characteristic of peer-to-peer architecture?

Each node has equal responsibilities. (C) Signup and view all the answers

What is an example of content sharing that commonly uses peer-to-peer architecture?

File streaming services. (A) Signup and view all the answers

How does parallel computing differ in memory access compared to typical distributed computing?

Parallel computing provides shared memory access for all processors. (B) Signup and view all the answers

Which type of computing emphasizes performance and coordination across multiple networks?

Grid computing. (D) Signup and view all the answers

What is a core difference in coupling between grid computing and other distributed systems?

Grids are loosely coupled externally while tightly coupled internally. (C) Signup and view all the answers

What is the defining characteristic of cloud computing?

Delivering hosted services over the internet. (C) Signup and view all the answers

Which programming paradigm inspired the creation of MapReduce?

Lisp. (D) Signup and view all the answers

Which services have adopted MapReduce as a key technology?

Hadoop, Mongo, AWS, and Azure. (B) Signup and view all the answers

In the MapReduce programming model, what is the role of the 'reduce' function?

To merge intermediate values associated with the same key. (B) Signup and view all the answers

If using MapReduce to count words in a set of distributed documents, what is the responsibility of the map function?

Produce word-exist pairs to intermediate storage. (B) Signup and view all the answers

How does the MapReduce library initially divide the input files when starting the processing on a cluster of machines?

Into segments of 64 megabytes each. (A) Signup and view all the answers

In MapReduce, what step follows after a worker is assigned a map task and reads the corresponding input split?

The worker passes each key/value pair to the user-defined Map function. (A) Signup and view all the answers

What action must a MapReduce worker complete when it has read all intermediate data?

Sort the data by the intermediate keys. (D) Signup and view all the answers

How does MapReduce handle a worker failure during a computation task?

The master redistributes the failed worker’s task to another worker. (C) Signup and view all the answers

Why is a combiner function used in MapReduce?

To reduce the amount of intermediate data by processing similar keys at map workers. (B) Signup and view all the answers

Which default partitioning function is used in MapReduce?

HASH(key) mod R. (B) Signup and view all the answers

What happens if the master task dies in MapReduce?

A new master is started from the last checkpointed state. (B) Signup and view all the answers

Flashcards

Data Model Types

A database system can be divided according to data model into three types: Structured, Semi-structured, and Unstructured.

Structured Data

Data with an identifiable structure, presented in rows and columns, and organized so the definition, format, and meaning are explicitly understood.