10 Questions
What are the steps involved in solving problems with data according to the lecture?
collect & understand data, clean & format data, use data to create solution
Where does internal data come from, as mentioned in the lecture?
business-centric data in organizational databases recording day to day operations, scientific or experimental data
What are the sources of existing external data mentioned in the lecture?
public government databases, stock market data, Yelp reviews
What are the cautionary notes mentioned about using online data, as per the lecture?
not all data that is accessible is good to be used
What are the two methods mentioned for obtaining online data in the lecture?
using software, scripts or by-hand extracting data from what is displayed on a page or what is contained in the HTML file, web scraping
From what sources can internal data be obtained, as mentioned in the lecture?
Internal data can be obtained from business-centric data in organizational databases recording day-to-day operations and scientific or experimental data.
What caution is mentioned about using online data, as per the lecture?
The caution mentioned is that not all data that is accessible is good to be used.
What are the two methods mentioned for obtaining online data in the lecture?
The two methods mentioned are obtaining data from APIs (e.g. Google Map API, Facebook API, Twitter API) and web scraping, which involves extracting data from what is displayed on a page or what is contained in the HTML file.
What are the sources of existing external data mentioned in the lecture?
Existing external data sources mentioned are public government databases, stock market data, and Yelp reviews, which are usually (somewhat) pre-processed.
What are the steps involved in solving problems with data according to the lecture?
The steps involved are collecting and understanding data, cleaning and formatting data, and using data to create a solution through data analysis and/or machine learning.
Study Notes
Data Science Overview
- Data science involves solving problems with data, which can be related to scientific, social, or business issues.
- The process of data science includes:
- Collecting and understanding data
- Cleaning and formatting data
- Using data to create a solution through data analysis and/or machine learning
Data Sources
Internal Sources
- Data from organizational databases recording day-to-day operations
- Scientific or experimental data
Existing External Sources
- Data available for free or a fee
- Examples include:
- Public government databases
- Stock market data
- Yelp reviews
- Typically, this data is somewhat pre-processed
Collecting Your Own Data
- Beyond the scope of this course
Online Data
- Typically raw data from APIs (e.g. Google Map API, Facebook API, Twitter API)
- Web scraping:
- Extracting data from what is displayed on a page or what is contained in the HTML file
- Caution: not all accessible data is good to be used
Test your knowledge of exploratory data analysis with this quiz on CSE217 Introduction to Data Science Lecture
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free