Data Science Concepts Quiz
41 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a new mantra in data collection?

  • Store data only when necessary.
  • Gather whatever data you can whenever and wherever possible. (correct)
  • Limit data collection to relevant sources.
  • Data should be collected only for future analysis.
  • Which of the following is a reason for the growth in data collection?

  • Increased power and affordability of computers. (correct)
  • Decrease in data generation technologies.
  • Reduction in the number of data sources.
  • Increased cost of computing resources.
  • What type of data does Amazon handle millions of each day?

  • E-commerce visits and transactions. (correct)
  • Scientific research information.
  • Web browsing data.
  • Social media interactions.
  • Why is competitive pressure significant in data science?

    <p>To improve customer relationship management with customized services.</p> Signup and view all the answers

    Which of the following statements is correct regarding large-scale data?

    <p>It is experiencing phenomenal growth due to data generation technologies.</p> Signup and view all the answers

    What is the primary goal of market segmentation in clustering?

    <p>To subdivide a market into distinct subsets of customers.</p> Signup and view all the answers

    In document clustering, what is primarily used to measure the similarity between documents?

    <p>The frequencies of important terms appearing in the documents.</p> Signup and view all the answers

    Which application is NOT related to deviation or anomaly detection?

    <p>Market Segmentation</p> Signup and view all the answers

    How can clustering quality be assessed in market segmentation?

    <p>By observing buying patterns within the same cluster and between different clusters.</p> Signup and view all the answers

    What is an example of a deviation detection application?

    <p>Identifying anomalous behavior in sensor networks.</p> Signup and view all the answers

    What is a primary responsibility of a data scientist working in healthcare?

    <p>Creating models to predict effective treatments for patients</p> Signup and view all the answers

    In which of the following industries can a data scientist work?

    <p>Any industry that utilizes data</p> Signup and view all the answers

    Which challenge is associated with data science that involves managing varying types of data?

    <p>Heterogeneous and Complex Data</p> Signup and view all the answers

    What does scalability refer to in the context of data science?

    <p>The capacity to handle growing amounts of data efficiently</p> Signup and view all the answers

    What type of analysis might non-traditional data scientists employ?

    <p>Advanced machine learning algorithms</p> Signup and view all the answers

    What is the primary goal of churn prediction for telephone customers?

    <p>To predict customer loss to competitors</p> Signup and view all the answers

    What attributes are analyzed to classify credit card transactions?

    <p>All past transactions labeled as fraud or fair</p> Signup and view all the answers

    What is a significant success identified in the sky survey cataloging application?

    <p>Finding new high red-shift quasars</p> Signup and view all the answers

    Which method is NOT typically used in classifying galaxies?

    <p>Analyzing financial records</p> Signup and view all the answers

    In regression analysis, what is the main purpose?

    <p>To predict continuous variable values</p> Signup and view all the answers

    How many images are used in the sky survey cataloging?

    <p>3000 images with high pixel resolution</p> Signup and view all the answers

    Which attribute is used in modeling customer loyalty?

    <p>Customer call frequency</p> Signup and view all the answers

    What is the data size of the object catalog in the galaxy classification?

    <p>All of the above</p> Signup and view all the answers

    What is the primary goal of classification in machine learning?

    <p>To predict the class of a new observation based on training data</p> Signup and view all the answers

    In the context of NYC Taxi Cab Data, which of the following tasks would most likely involve classification?

    <p>Determining whether a ride is flagged as a fraudulent trip</p> Signup and view all the answers

    Which of the following is NOT a common classification task mentioned in the content?

    <p>Predicting stock market trends</p> Signup and view all the answers

    What type of data is typically used in fraud detection classification tasks?

    <p>Transactions and personal information of account-holders</p> Signup and view all the answers

    When using predictive modeling for classification, what is the term for the portion of data used to evaluate the model?

    <p>Test set</p> Signup and view all the answers

    Which of the following best describes the training process in predictive modeling?

    <p>Developing predictive rules from labeled data</p> Signup and view all the answers

    In classification tasks, what kind of model would be used to predict whether a taxi ride is a good or bad fare?

    <p>Classification model</p> Signup and view all the answers

    In the context of animal or environmental classification, which method is similar to that of detecting fraudulent credit card transactions?

    <p>Classifying tumor cells as benign or malignant</p> Signup and view all the answers

    What is one of the responsibilities of a data scientist when predicting crime locations?

    <p>Build a model based on previous crime data</p> Signup and view all the answers

    Which task do data scientists perform before constructing their models?

    <p>Clean and normalize data</p> Signup and view all the answers

    What kind of positions can graduates of a data science program pursue?

    <p>Software development engineers and business analysts</p> Signup and view all the answers

    In the software development field, what issues can individuals face if they lack business knowledge?

    <p>Creating software that meets business requirements</p> Signup and view all the answers

    What advantage does a CIS graduate have when working in a software development team for healthcare?

    <p>They understand the healthcare domain's technicalities</p> Signup and view all the answers

    What is one of the roles of a business analyst?

    <p>To help define customer requirements</p> Signup and view all the answers

    How do software developers benefit from understanding business operations?

    <p>They choose architectures that support future changes</p> Signup and view all the answers

    What does a system implementer do in their role?

    <p>Ensure users know how to effectively utilize the software</p> Signup and view all the answers

    Which of the following best describes the CIS program?

    <p>It combines technology and business knowledge</p> Signup and view all the answers

    Why is data compliance necessary in data collection?

    <p>To handle data collection from a legal perspective</p> Signup and view all the answers

    Study Notes

    Data Science Overview

    • Data science involves enormous growth in commercial and scientific databases due to advancements in data generation and collection technologies.
    • A key mantra is gathering whatever data possible, anytime and anywhere.
    • Gathered data will have value, either for the original purpose or for a purpose not anticipated beforehand.

    Why Data Science? (Commercial Viewpoint)

    • Large amounts of data are being collected and warehoused, including web data (e.g., Google).
    • Social media platforms (e.g., Facebook, Amazon) have billions of active users.
    • E-commerce involves millions of daily visits and transactions (e.g., Amazon).
    • Computing powers have become more accessible and affordable.
    • Competition requires companies to provide better, customized services.

    Why Data Science? (Scientific Viewpoint)

    • Data is collected and stored at enormous speeds.
    • Remote sensors on satellites store petabytes of earth science data annually (e.g., NASA EOSDIS archives).
    • Telescopes capture data, scanning the skies (e.g., Sky survey data).
    • High-throughput studies involve biological data and scientific simulations (e.g., terabytes of data generated rapidly).
    • Data science helps automate analysis of massive datasets and facilitates hypothesis formation.

    Opportunities to Solve Society's Problems

    • Data science can improve healthcare and reduce costs.
    • Data science can predict the impact of climate change.
    • Data science enables the discovery of alternative green energy sources.
    • Data science can address hunger and poverty issues by increasing agricultural production.

    What is Data Science?

    • Data science is an emerging field, not yet fully defined.
    • Key elements of data science include exploratory data analysis and visualization, machine learning, and high-performance computing techniques for dealing with large-scale data.

    Skill Sets for Data Science

    • Data science requires a combination of computer science, hacking skills, machine learning, math & statistics (traditional research, data science), and substantive expertise (domain science).

    Appreciating Data

    • Computer scientists may not naturally appreciate the significance of data.
    • Data can be used to test and validate algorithms, but obtaining useful data sets requires effort and innovation

    Computer Scientists vs. Real Scientists

    • Scientists study the complexity of the natural world, whereas computer scientists create organized, clean virtual worlds.
    • Scientific truths are multifaceted, whereas computer science deals in definite, "true" or "false" statements.

    Computer Scientists vs. Real Scientists (continued)

    • Scientists are data-driven, while computer scientists are algorithm-driven.
    • Scientists focus on exploring and discovering things, whereas computer scientists create or invent.
    • Scientists readily acknowledge the limitations and errors in data.

    Genius vs. Wisdom

    • Data science depends more on wisdom (knowing what to avoid) than on genius (knowing the right answer).
    • Software developers focus on code production.
    • Data scientists focus on creating insights.

    Developing Wisdom

    • Wisdom comes from experience, general knowledge, listening to others, and humility (acknowledging mistakes), recognizing errors and their causes.
    • Data scientists often struggle to achieve accurate predictions, which makes experience crucial to their practice.

    Developing Curiosity

    • Good data scientists develop curiosity about their domain/application.
    • Engage in discussions with those working with the data.
    • Staying informed about the world through daily reading is beneficial.

    Asking Good Questions

    • Data scientists should ask questions to extract meaningful insights from data sets.
    • Evaluate what questions the users and stakeholders need answered.
    • Consider which datasets can provide answers to those questions.

    Let's Practice Asking Questions!

    • Questions relating to the three datasets include who, what, where, when, and why.
    • The three datasets are Baseball-reference.com, Google Ngrams, and NYC taxi cab records.

    Statistical Record of Play

    • Baseball-reference.com provides detailed records of each year's batting, pitching, and fielding data for baseball players.
    • Includes teams, awards, and other statistics.

    Baseball Questions

    • Focus on measuring player skill, evaluating trade fairness, analyzing career trajectories, and correlating batting performance with positions.

    Demographic Questions

    • Explore whether left-handed people have shorter lifespans than right-handers, frequency of returns to places of birth, the relationship between salaries and performance, and potential changes in human height and weight.

    Google Ngrams

    • Google Ngrams is a resource tracking word and phrase frequency over time.
    • Includes 1 to 5 word phrases, providing an annual time series of their use.

    Ngram Questions

    • Questions relate to changes in cursing over time, lifespans of fame and technology trends, the emergence and persistence of new words, and association patterns in language.

    NYC Taxi Cab Data

    • Offers detailed data for every taxi trip, including driver/owner, pickup/dropoff locations, and fares from NYC, obtained through a Freedom of Information Act request.

    Taxicab Questions

    • Focus on drivers' earnings, travel distances, traffic patterns during rush hours, travel destinations at various times, drivers' tipping performance, and optimal pick-up strategies.

    Machine Learning Tasks

    • Tasks include clustering, predictive modeling, and anomaly detection.

    Predictive Modeling: Classification

    • Predictive modeling aims to use other attributes to determine the attribute specified.
    • For example, modeling creditworthiness or predicting specific patient treatments.
    • Classification techniques are crucial in many applications (e.g., fraud detection).

    Applications of Classification Tasks

    • Classifying credit card transactions as valid or fraudulent
    • Identifying land cover types using satellite data
    • Determining the category of news stories
    • Identifying intruders within cyberspace and predicting outcomes
    • Classifying protein secondary structures

    Classification: Application 1 (Fraud Detection)

    • Goal is to predict cases of fraud from credit card transactions.
    • Credit card transactions and account details become important attributes.
    • Transactions categorized as fraudulent or legitimate form a class variable for training a model.
    • The model observes new transactions to detect fraud.

    Classification: Application 2 (Churn Prediction)

    • Goal is predicting whether a telephone customer will leave for a competitor.
    • Customer behaviors, transaction data, financial profiles, and other factors are key attributes.

    Classification: Application 3 (Sky Survey Cataloging)

    • Goal is to classify stars and galaxies from survey images, specifically focusing on visually faint objects from the Palomar Observatory.
    • Image segmentation, measuring attributes like light characteristics, and classification models are key components.

    Classifying Galaxies

    • Data contains a large amount of images regarding stars/galaxies, used for modeling/classification.
    • Image data is characterized by attributes (e.g., image features, characteristics of received light).

    Regression

    • Regression models use continuous-valued attributes to predict the value of a continuous dependent variable, assuming a linear or nonlinear dependency.
    • For example, new product sales projection, adjusting for advertising expenses, or predicting wind speed based on temperature (other environmental metrics).

    Clustering

    • Aim is grouping data points, clustering minimizes distances within clusters and maximizes between clusters.

    Applications of Cluster Analysis

    • Understanding and targeting customer demographics for improved marketing campaigns
    • Clustering related documents in groups for user access
    • Grouping genes/proteins based on performance, function, and similarities
    • Categorizing price fluctuations for stocks

    Clustering: Application 1 (Market Segmentation)

    • Goal is dividing a market into customer segments with similar characteristics for improved/targeted marketing.
    • Identifying customer attributes (e.g., demographics, purchasing behaviors) to segment them effectively.
    • Measuring segment similarity by examining buying patterns within or across the segments.

    Clustering: Application 2 (Document Clustering)

    • Goal is classifying documents into groups with similar contents/themes.
    • Identify frequent terms/topics within documents for creating similarity metrics.
    • Similarity metrics and document clustering form the foundation for analysis.

    Deviation/Anomaly/Change Detection

    • Detecting significant deviations from normal patterns.
    • Applications include credit card fraud detection, network intrusion detection, changes in sensor networks, and monitoring/tracking changes in global forest cover.

    Motivating Challenges

    • Data science faces challenges relating to scalability (handling large datasets), high dimensionality (extensive attributes), heterogeneity and complexity of data formats, ownership and distribution issues relating to various data sources, and non-traditional analysis methods.

    DS Career Path

    • Data Science (DS) graduates can find diverse career paths.

    Introduction

    • Data science programs produce graduates who usually choose data scientist positions in most cases.
    • Data scientists can work in organizations like private companies, government agencies, and non-profit organizations.

    Industries

    • Data science is relevant to a wide range of industries (e.g., finance, government, healthcare, online platforms, large retailers, agriculture)

    Data Scientist Responsibilities

    • Data Scientists build and validate data models, used by their employers to predict, recommend, and evaluate future business decisions.
    • Data Scientists are responsible for preparing/cleaning data for these models.
    • Data management procedures are involved in data collection with considerations of the data's compliance with rules and legal standards.

    More Opportunities

    • Graduates may opt for software development roles as well as specialized roles creating business intelligence dashboards, presenting results through charts/reports to users.

    CIS Career Path

    • CIS graduates often pursue careers in software development, business analysis, and system implementation.

    Introduction (CIS)

    • CIS (Computer Information Systems) programs generally combine technology and business aspects, which equip graduates with a broad set of skills valuable in diverse areas.
    • Graduates from these programs usually are adept in adapting technology (and knowledge of existing business systems) to achieve greater efficiency.

    Introduction (continued)

    • Technology knowledge alone may fail to address business standards, international business standards, or organizational constraints within software development.

    Example

    • Exposure to healthcare systems prepares graduates from CIS programs with understanding of EHR (electronic health record features/functionality).

    You as a Business Analyst

    • Business analysts usually define requirements for software systems that cater to customer needs.
    • Familiarity with existing systems is often helpful in understanding customer needs.

    You as a Software Developer

    • Software developers create software systems based on functional requirements and business objectives.
    • Business awareness allows developers to choose proper architectures that can support future business standards & requirements.

    You as a System Implementer

    • System implementers will guide users through how to utilize software systems properly.
    • Experience and understanding of how businesses operate will lead to useful guidance for system usage.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Module 4.1 - Data Science PDF

    Description

    Test your knowledge on key concepts in data science, covering topics like data collection, clustering, market segmentation, and anomaly detection. This quiz explores various aspects of the data science field, including the responsibilities of data scientists and the challenges they face. Put your understanding to the test and see how well you know data science!

    More Like This

    Data Science Principles Quiz
    11 questions

    Data Science Principles Quiz

    SatisfactoryMetaphor avatar
    SatisfactoryMetaphor
    Use Quizgecko on...
    Browser
    Browser