Podcast
Questions and Answers
Which of the following is NOT a characteristic of NumPy arrays that contributes to their efficiency?
Which of the following is NOT a characteristic of NumPy arrays that contributes to their efficiency?
- Data is stored in a continuous block of memory.
- They perform complex computations on entire arrays without explicit loops.
- They are the lingua franca for data exchange between Python libraries.
- They can dynamically increase in size after creation. (correct)
NumPy's ndarray
is designed to handle vectorized operations which apply functions to each element in the array.
NumPy's ndarray
is designed to handle vectorized operations which apply functions to each element in the array.
True (A)
What function is used to convert a NumPy array back into a Python list?
What function is used to convert a NumPy array back into a Python list?
tolist()
The ______
attribute of a NumPy array specifies the data type of the elements stored in the array.
The ______
attribute of a NumPy array specifies the data type of the elements stored in the array.
Match the NumPy related term to the correct description.
Match the NumPy related term to the correct description.
Which of the following is NOT a valid way to create a Pandas DataFrame?
Which of the following is NOT a valid way to create a Pandas DataFrame?
In Pandas, each column within a DataFrame is essentially a Series object.
In Pandas, each column within a DataFrame is essentially a Series object.
What is one primary difference between a Pandas Series and a Python list that makes a Pandas Series behave like a dictionary?
What is one primary difference between a Pandas Series and a Python list that makes a Pandas Series behave like a dictionary?
Use ______
to read a CSV within Pandas.
Use ______
to read a CSV within Pandas.
Match the Pandas related term with its description.
Match the Pandas related term with its description.
Which of the following is NOT a step in the machine learning life cycle?
Which of the following is NOT a step in the machine learning life cycle?
Data-driven techniques are one of the key features of machine learning?
Data-driven techniques are one of the key features of machine learning?
Who coined the term Machine Learning?
Who coined the term Machine Learning?
______
involves choosing analytical techniques, building models, and reviewing results.
______
involves choosing analytical techniques, building models, and reviewing results.
Match the stage in the Machine Learning Lifecycle to the relevant description.
Match the stage in the Machine Learning Lifecycle to the relevant description.
What is the 'lingua franca' for data exchange used by many Python libraries like pandas and scikit-learn?
What is the 'lingua franca' for data exchange used by many Python libraries like pandas and scikit-learn?
Once a NumPy array is created, its size can be increased dynamically without creating a new array.
Once a NumPy array is created, its size can be increased dynamically without creating a new array.
Besides specifying data type during array creation, what method allows you to convert a NumPy array to a different data type after it's created?
Besides specifying data type during array creation, what method allows you to convert a NumPy array to a different data type after it's created?
In Pandas, if an index is not explicitly provided when creating a Series, Pandas automatically creates a ______
index.
In Pandas, if an index is not explicitly provided when creating a Series, Pandas automatically creates a ______
index.
Match the given description to the component of data read in machine learning.
Match the given description to the component of data read in machine learning.
Why is NumPy considered important for numerical computations?
Why is NumPy considered important for numerical computations?
Arrays in NumPy can store elements of different data types.
Arrays in NumPy can store elements of different data types.
What attribute of NumPy arrays allows inspecting the number of dimensions in the array?
What attribute of NumPy arrays allows inspecting the number of dimensions in the array?
The Pandas function ______
is used to write a DataFrame to a CSV file.
The Pandas function ______
is used to write a DataFrame to a CSV file.
Match the Pandas DataFrame operation to the method used:
Match the Pandas DataFrame operation to the method used:
Which feature of machine learning allows it to improve automatically?
Which feature of machine learning allows it to improve automatically?
Rapid increment in data results in hidden patterns in extracting useful information from data
Rapid increment in data results in hidden patterns in extracting useful information from data
Which of the following functions that reads the CSV file.
Which of the following functions that reads the CSV file.
______
includes the below tasks of identifying various data sources and collecting the data.
______
includes the below tasks of identifying various data sources and collecting the data.
Match the given characteristics with the process.
Match the given characteristics with the process.
What advantages does NumPy's ndarray
provide over standard Python lists for numerical computations?
What advantages does NumPy's ndarray
provide over standard Python lists for numerical computations?
Pandas is a low-level library written in pure C, providing a foundation for libraries like NumPy.
Pandas is a low-level library written in pure C, providing a foundation for libraries like NumPy.
In Pandas DataFrames, what is the purpose of the .drop()
function?
In Pandas DataFrames, what is the purpose of the .drop()
function?
The machine learning process involves ______
, where a model is provided with a test dataset to check for accuracy.
The machine learning process involves ______
, where a model is provided with a test dataset to check for accuracy.
Match the following library with its functions.
Match the following library with its functions.
Which of the following is a characteristic of Pandas Series?
Which of the following is a characteristic of Pandas Series?
Boolean Indexing is the same shape as the array to be filtered?
Boolean Indexing is the same shape as the array to be filtered?
How can we improve the Machine Learning model?
How can we improve the Machine Learning model?
Collected data has various issues, including the ______
.
Collected data has various issues, including the ______
.
Match name attribute to function in Numpy libraries.
Match name attribute to function in Numpy libraries.
Flashcards
What is NumPy?
What is NumPy?
A foundational Python package for numerical computing.
NumPy ndarray
NumPy ndarray
A fast, flexible container for large datasets in Python.
Arrays vs. Lists
Arrays vs. Lists
Arrays are designed for vectorised operations; lists are not.
np.array()
np.array()
Signup and view all the flashcards
dtype Argument
dtype Argument
Signup and view all the flashcards
astype Method
astype Method
Signup and view all the flashcards
tolist() Function
tolist() Function
Signup and view all the flashcards
Inspecting NumPy Array
Inspecting NumPy Array
Signup and view all the flashcards
Array Indexing
Array Indexing
Signup and view all the flashcards
Boolean Indexing
Boolean Indexing
Signup and view all the flashcards
What is Pandas?
What is Pandas?
Signup and view all the flashcards
Pandas Series
Pandas Series
Signup and view all the flashcards
Pandas DataFrame
Pandas DataFrame
Signup and view all the flashcards
DataFrame Creation
DataFrame Creation
Signup and view all the flashcards
DataFrame Access
DataFrame Access
Signup and view all the flashcards
.loc[] and .iloc[]
.loc[] and .iloc[]
Signup and view all the flashcards
.drop() function
.drop() function
Signup and view all the flashcards
pd.read_csv()
pd.read_csv()
Signup and view all the flashcards
df.to_csv()
df.to_csv()
Signup and view all the flashcards
Machine Learning
Machine Learning
Signup and view all the flashcards
How Machine Learning Works
How Machine Learning Works
Signup and view all the flashcards
Why Use Machine Learning?
Why Use Machine Learning?
Signup and view all the flashcards
ML lifecycle: data gathering
ML lifecycle: data gathering
Signup and view all the flashcards
ML lifecycle: data preparation
ML lifecycle: data preparation
Signup and view all the flashcards
ML lifecycle: data wrangling
ML lifecycle: data wrangling
Signup and view all the flashcards
ML lifecycle: data analysis
ML lifecycle: data analysis
Signup and view all the flashcards
ML lifecycle: Model training
ML lifecycle: Model training
Signup and view all the flashcards
ML lifecycle: data test
ML lifecycle: data test
Signup and view all the flashcards
ML lifecycle: data deployment
ML lifecycle: data deployment
Signup and view all the flashcards
Study Notes
Data Analysis and Visualization with Python
- The topic centers on using Python for data analysis and visualization
Numerical Python (NumPy)
- NumPy is the foundational package for numerical computing in Python
- A solid understanding of NumPy is mandatory for data analysis or machine learning projects
- Libraries like Pandas and Scikit-learn use NumPy's array objects for data exchange
- NumPy is designed for efficiency with large arrays of data
NumPy Efficiency
- Stores data internally in a continuous block of memory, independently of other Python objects
- Performs complex computations on entire arrays without the need for 'for' loops
What You'll Find in NumPy
- ndarray: An efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities
- Mathematical functions for fast operations on entire arrays of data without loops
- Tools for reading/writing array data to disk and working with memory-mapped files are available
- NumPy supports linear algebra, random number generation, and Fourier transform capabilities
- A C API connects NumPy with libraries written in C, C++, and FORTRAN, making Python a choice for wrapping legacy codebases
NumPy ndarray: Multi-Dimensional Array Object
- The NumPy ndarray object serves as a fast, flexible container for large datasets in Python
- NumPy arrays are different than Python lists
- Arrays allow storage of multiple items of the same data type
- NumPy’s facilities around the array object helps in math and data manipulations
Ndarray vs. Lists
- Storing numbers and objects in a Python list enables computations/manipulations with list comprehensions and for-loops
- NumPy arrays provide significant advantages over lists
Creating a NumPy Array
- One way to comprehend the advantages is to create an array
- One way to create a NumPy array is by creating one from a list by using the np.array() function
import numpy as np
list1 = [0, 1, 2, 3, 4]
arr = np.array(list1)
print(type(arr)) # <type 'numpy.ndarray'>
print(arr) # [0 1 2 3 4]
Differences Between Lists and ndarrays
- Arrays are designed to handle vectorised operations
- Python lists are not designed to handle vectorised operations
- Applying a function occurs on every item in the array, rather than on the whole array object.
Differences Between Lists and ndarrays Example
- Adding the number 2 to every item in the list with this method produces a TypeError
import numpy as np
list1 = [0, 1, 2, 3, 4]
list1 = list1+2 # TypeError: can only concatenate list (not "int") to list
- NumPy can perform the addition of 2 to every item in the array without error
import numpy as np
list1 = [0, 1, 2, 3, 4]
arr = np.array(list1)
print(arr) # [0 1 2 3 4]
arr = arr+2
print(arr) # [2 3 4 5 6]
NumPy Array Note
- Once a Numpy array is created, you cannot increase its size
- Creation of a new array is required to increase size
Create a 2d Array From a List of List
- Passing a list of lists can create a matrix-like 2d array
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2=np.array(list2)
print(arr2) # [[0 1 2] [3 4 5] [6 7 8]]
The dtype Argument
- Specifying the data-type can be done by setting the dtype() argument
- Commonly used NumPy dtypes include: float, int, bool, str, and object
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr3=np.array (list2, dtype='float')
print(arr3) # [[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]
The astype Argument
- Conversion to a different data-type can be done using the astype method
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr3=np.array (list2, dtype='float')
print(arr3) # [[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]
arr3_s = arr3.astype('int').astype('str')
print(arr3_s) # [['0' '1' '2'] ['3' '4' '5'] ['6' '7' '8']]
- Arrays require that all items be of the same type
dtype='object'
- The dtype can be set as 'object' to hold different types of data in the array
arr_obj = np.array([1, 'a'], dtype='object')
print(arr_obj) # [1 'a']
The tolist() Function
- Arrays can be converted into a list using the tolist() command
In: arr_list = arr_obj.tolist()
print(arr_list) # [1, 'a']
Inspecting a NumPy Array
- Built-in NumPy functions inspect array aspects:
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr3=np.array (list2, dtype='float')
print('Shape:', arr3.shape) # Shape: (3, 3)
print('Data type:', arr3.dtype) # Data type: float64
print('Size:', arr3.size) # Size: 9
print('Num dimensions:', arr3.ndim) # Num dimensions: 2
Extracting Specific Items From an Array
- Array portions can be extracted using indices
- Arrays accept as many parameters in the square brackets as there are number of dimensions
import numpy as np
list2
=
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr3=np.array (list2, dtype='float')
print("whole:", arr3)
## whole: [[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]
print("Part:", arr3 [:2, :2])
## Part: [[0. 1.] [3. 4.]]
Boolean Indexing
- A boolean index array is the same shape as the array-to-be-filtered
- Contains TRUE and FALSE values
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr3=np.array(list2, dtype='float')
boo = arr3>2
print(boo) # [[False False False] [ True True True] [ True True True]]
Pandas
- One of the most popular Python libraries for data analysis
- A high-level abstraction over low-level NumPy, which is written in pure C
- Provides high-performance, easy-to-use data structures and data analysis tools
- Data frames and series are the two main structures used by Pandas
Indices in a Pandas Series
- Pandas series are similar to lists, but differ by associating a label with each element
- If an index is not provided, Pandas creates a Rangelndex ranging from 0 to N-1
- Each series object has a data type
import pandas as pd
new_series = pd.Series([5, 6, 7, 8, 9, 10])
print(new_series)
## 0 5
## 1 6
## 2 7
## 3 8
## 4 9
## 5 10
## dtype: int64
Pandas Series Index and Values
- A series extracts all values in the series and individual elements by index
import pandas as pd
new_series = pd.Series([5, 6, 7, 8, 9, 10])
print(new_series.values) # [5 6 7 8 9 10]
print(new_series[4]) # 9
- Manually providing an index is possible
import pandas as pd
new_series = pd.Series([5, 6, 7, 8, 9, 10], index=['a', 'b', 'c', 'd', 'e', 'f'])
print(new_series.values) # [5 6 7 8 9 10]
print(new_series['f']) # 10
Pandas Series Example
- Retrieving several elements of a series by indices and making group assignments is easy
import pandas as pd
new_series = pd.Series([5, 6, 7, 8, 9, 10], index= ['a', 'b', 'c', 'd', 'e', 'f'])
print(new_series)
new_series [['a', 'b', 'f']] = 0
print(new_series)
## a 0
## b 0
## c 7
## d 8
## e 9
## f 0
## dtype: int64
Filtering and Math Operations with Pandas
- Filtering and maths operations are easy with Pandas
import pandas as pd
new_series = pd.Series([5, 6, 7, 8, 9, 10], index=['a', 'b', 'c', 'd', 'e', 'f'])
new_series2 = new_series[new_series>0]
print(new_series2)
new_series2[new_series2>0]*2
print(new_series2)
## a 5
## b 6
## c 7
## d 8
## e 9
## f 10
## dtype: int64
Pandas Data Frame
- A data frame is a table with rows and columns
- Each column in a data frame is a series object
- Rows consist of elements inside series
## Example Data Frame Table
Case ID Variable one Variable two Variable 3
1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
Creating a Pandas Data Frame
- Pandas data frames are constructed using Python dictionaries
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]})
print(df)
## country population square
## 0 Kazakhstan 17.04 2724902
## 1 Russia 143.50 17125191
## 2 Belarus 9.50 207600
## 3 Ukraine 45.50 603628
Data Frames from a List
- Creating data frames can also be done from a list
import pandas as pd
list2 = [[0,1,2], [3,4,5], [6,7,8]]
df = pd.DataFrame(list2)
print(df)
df.columns = ['V1', 'V2', 'V3']
print(df)
## V1 V2 V3
## 0 0 1 2
## 1 3 4 5
## 2 6 7 8
Column Types
- The type() function gets the type of a column:
print(type (df['country']))
## <class 'pandas.core.series.Series'>
Pandas Data Frames and Indices
- Pandas data frame objects have two indices: a column index and row index
- If one is not provided, Pandas creates a Rangelndex from 0 to N-1.
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]})
print(df.columns)
print(df.index)
## Index(['country', 'population', 'square'], dtype='object')
## RangeIndex(start=0, stop=4, step=1)
Row Indices
- Providing row indices can be done explicitly
- An index can be provided when creating a data frame
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]
}, index=['KZ', 'RU', 'BY', 'UA'])
print(df)
- Can also be done during runtime
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]
})
print(df)
df.index = ['KZ', 'RU', 'BY', 'UA']
df.index.name = 'Country Code'
print(df)
Row Access Using Index
- Row access can be performed in several ways using index
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]
}, index=['KZ', 'RU', 'BY', 'UA'])
print(df.loc['KZ'])
print(df.iloc[0])
## country Kazakhstan
## population 17.04
## square 2724902
## Name: KZ, dtype: object
Selecting Rows and Columns
- Particular rows and columns can be selected:
import pandas as pd
df = pd.DataFrame({
'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'],
'population': [17.04, 143.5, 9.5, 45.5],
'square': [2724902, 17125191, 207600, 603628]
}, index=['KZ', 'RU', 'BY', 'UA'])
print(df.loc[['KZ', 'RU'], 'population'])
- The .loc() function can contain two arguments: index list and column list
- Slicing operations are supported:
print(df.loc['KZ': 'BY', :])
Filtering
- Filtering is performed using Boolean arrays
print(df[df.population > 10] [['country', 'square']])
Deleting Columns
- The drop() function deletes a column
df = df.drop(['population'], axis='columns')
Reading and Writing to a File
- Pandas supports many popular file formats including CSV, XML, HTML, Excel, SQL, JSON, etc.
- CSV is the file format
- Data can be read from a CSV file using the read_csv() function
df = pd.read_csv('filename.csv', sep=',')
- Writing a data frame to a csv file can be done with the to_csv() function
df.to_csv('filename.csv')
Pandas Capabilities
- Pandas can do more than what has been covered, such as grouping data and data visualization
Machine Learning
- Machine learning is a subset of artificial intelligence
- Development of algorithms that allow a computer to learn from data and past experiences
- Arthur Samuel introduced the term in 1959
How Machine Learning Works
- Machine learning algorithms learn from data
- Logical models are built
- Algorithms produce an output
Features of Machine Learning
- Machine learning uses data to detect various patterns in a given dataset
- It can learn from past data and improve automatically
- A data-driven technology similar to data mining because it deals with large amounts of data
Need for Machine Learning
- Rapid increase in the production of data
- Solving complex problems difficult for a human
- Decision making in various sectors/finance
- Finding hidden patterns and extracting useful information from data
Supervised Learning
- The machine learning model makes predictions or decisions based on labeled data
- Task Driven (Predict Next Value)
Unsupervised Learning
- The machine learning algorithm finds patterns & structure in unlabeled data
- Data Driven (Identify Clusters)
Reinforcement Learning
- The machine learning algorithm learns to make a sequence of decisions
- Learn from Mistakes
Machine Learning Classifications
- Supervised learning uses continuous/categorical target variables for Regression and Classification tasks
- Unsupervised learning does not use a target variable for Clustering and Association tasks
- Reinforcement learning uses categorical target variables and makes decisions
Application of Machine Learning
- Machine learning has a variety of applications:
- Meaningful Compression
- Big Data Visualization
- Structure Discovery
- Image Classification
- Customer Retention
- Dimensionality Reduction
- Feature Elicitation
- Identify Fraud Detection
- Classification
- Diagnostics
- Advertising Popularity Prediction
- Weather Forecasting
- Recommender Systems
- Targeted Marketing
- Population Growth Prediction
- Customer Segmentation
- Estimating Life expectancy
- Game AI
- Skill Acquisition
- Real-Time Decisions
- Robot Navigation
Machine Learning Usage
- Automatic Language Translation
- Virtual Personal Assistant
- Speech/Image Recognition
- E-Mail Spam
- Malware Filtering
Data Gathering
- Identify data sources
- Collect data
- Integrate data from different sources
Data Preparation
- Data exploration identifies dataset characteristics and quality
- Data pre-processing is for analysis
Data Wrangling/Cleaning
- May have missing values
- May have duplicate/invalid data
- May contain noise
Data Analysis
- The Machine Learning analytical techniques are selected
- Building and evaluation of models
Train Model
- Improves model performance
Test Model
- Checks model accuracy with a test dataset
Deployment
- Deploy model in real-world system
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.