IT Service Management: Data Engineering

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following roles is primarily focused on building and maintaining the data infrastructure?

Machine Learning Practitioner
Data Engineer (correct)
Data Scientist
Data Analyst

A large greenhouse tomato grower wants to optimize their harvesting schedule based on predicted yields and market prices. Which category of analytics would be most suited to address this?

Diagnostic
Prescriptive (correct)
Descriptive
Predictive

A retail company is analyzing why many customers are leaving their service. Examining historical sales data, they identified a correlation between a recent change in the loyalty program and customer churn. Which type of analytics is exemplified?

Prescriptive
Descriptive
Diagnostic (correct)
Predictive

A hospital is using machine learning to predict which patients are most at risk of readmission within 30 days of discharge. Which type of analytics does this represent?

Predictive (C) Signup and view all the answers

A company is implementing a data strategy that involves moving from on-premises servers to cloud-based infrastructure. Which of the following aspects of modern data strategies does this align with?

Modernizing Infrastructure (B) Signup and view all the answers

An organization has a large amount of customer data stored in various databases and wants to create a single, accessible repository for analysis. Which component of modern data strategies would be most applicable?

Unify (A) Signup and view all the answers

A company wants to improve customer service. They decide to implement a system that uses AI to analyze real-time customer feedback and provide suggestions to service representatives. This change falls under which tenet of modern data strategies?

Innovation (A) Signup and view all the answers

In a data pipeline, what is the primary goal of the 'Ingestion' layer?

Collecting data from various sources. (C) Signup and view all the answers

In the context of data engineering, which of the following describes data wrangling?

The tasks taken transforming data as it passes through a pipeline, including discovery, cleaning, and normalization (C) Signup and view all the answers

Which of the following best describes the importance of iterating the data pipeline?

To refine and improve the data processing and analysis based on evaluated results. (C) Signup and view all the answers

Which question is a data engineer most likely to ask when designing a data pipeline?

How frequently is the data updated? (C) Signup and view all the answers

Which of the following questions is a data scientist most likely to consider when working with a data pipeline?

Do I need a big data framework? (A) Signup and view all the answers

In designing a data pipeline, which of the following is the MOST important initial step?

Defining the business problem to be solved. (A) Signup and view all the answers

When designing a data pipeline from the start, what describes starting with the business problem to be solved and working backwards to the data?

Work Backwards (D) Signup and view all the answers

What does the data pipeline include?

Layers to ingest, store, process, analyze, and visualize data passing through it (B) Signup and view all the answers

Which of the following scenarios exemplifies the use of data science in everyday life?

A navigation app alerting you to a traffic jam. (A) Signup and view all the answers

What are the considerations in deciding what type of analysis to use?

Cost, speed, accuracy (A) Signup and view all the answers

A company is deciding whether to invest in a more accurate prediction model. What factor should they consider?

How much incremental improvement justifies additional cost? (D) Signup and view all the answers

In the context of the Five V's of data, what does 'Veracity' refer to?

The accuracy, precision, and trustworthiness of data. (C) Signup and view all the answers

A company wants to improve decision-making and needs to determine the relevance of its available data. Which of the five V's of data should they focus on?

Value (A) Signup and view all the answers

A company is struggling to analyze customer data due to inconsistencies in formatting and data types across different sources. Which characteristic of the five V's of data presents the greatest challenge?

Variety (D) Signup and view all the answers

A data pipeline processes clickstream data from a high-traffic website. Which of the following considerations related to data characteristics is MOST critical for designing the pipeline's ingestion layer?

Volume and velocity of incoming click events. (C) Signup and view all the answers

Why could there be a choice between streaming ingestion and batch ingestion?

Sales transaction data from retailers across the world is sent to a central location periodically. (D) Signup and view all the answers

Why could there be a choice between Long term, reporting access and Short term, very fast access?

Because five years of sales data is stored for trending analysis, which is performed monthly AND incoming ecommerce sales transaction data is used to suggest additional purchases within the current session. (D) Signup and view all the answers

A financial company needs to analyze transaction data and generate alerts for potential fraud in real-time. Which type of processing is most suitable?

Streaming Analytics (A) Signup and view all the answers

What is the best description of unstructured data?

Hardest to query but the most flexible (B) Signup and view all the answers

Which of the following is an example of unstructured data?

An image of a handwritten document. (C) Signup and view all the answers

A company analyzes comments from an online chat application. How is it stored?

JSON (C) Signup and view all the answers

What is the best description of time-series data?

All other options are correct (B) Signup and view all the answers

A weather monitoring system collects temperature, humidity, and wind speed measurements from various sensors every minute. Which type of data source does this represent?

Events, IoT devices, sensors (D) Signup and view all the answers

A data team discovers inconsistencies and outdated information in a customer database obtained from a third-party vendor. Which data quality issue does this primarily relate to?

Veracity (A) Signup and view all the answers

What happens if value rests on veracity and the veracity is not strong?

You could make bad decisions. (B) Signup and view all the answers

Which of the following is a method of evaluating the veracity of data?

Does the source have audit trails? (D) Signup and view all the answers

A data analyst identifies a pattern of missing values in a specific field of a customer dataset. According to the best practices for cleaning data, what should be their NEXT step?

Trace errors by returning to data and finding root cause. (A) Signup and view all the answers

What describes immutable data?

All other options are correct (A) Signup and view all the answers

A company is deciding to analyze only aggregated data to save space. What is the BEST reason to avoid this practice?

Working from aggregate data can make errors harder to identify. (A) Signup and view all the answers

A company needs to store user activity data for compliance purposes and ensure that any changes to the data are fully traceable. Which approach is MOST suitable?

Implement an immutable data store, writing a new record for each event. (A) Signup and view all the answers

What is a good way to maintain data integrity and consistency?

All other options are correct (B) Signup and view all the answers

Flashcards

Data-Driven Decisions

Using data science to make informed decisions in an organization.

Data Analytics

Systematic analysis of large datasets to find patterns and produce actionable insights.