Podcast
Questions and Answers
If energy is defined as the ability to do work, what best describes the relationship between force and distance in this context?
If energy is defined as the ability to do work, what best describes the relationship between force and distance in this context?
- Energy is the inverse of force applied over a distance.
- Energy is unrelated to force and distance within the constraints of doing work.
- Energy is the force applied through a distance. (correct)
- Energy is equivalent to distance divided by force
How is power defined in relation to energy?
How is power defined in relation to energy?
- Power is the rate at which energy is used or transferred. (correct)
- Power is the capacity to store energy.
- Power is energy multiplied by time.
- Power is the total amount of energy available.
Which of the following equations correctly represents the relationship between the joule (J), newton (N), and meter (m)?
Which of the following equations correctly represents the relationship between the joule (J), newton (N), and meter (m)?
- $1 \text{ joule} = 1 \text{ newton} / 1 \text{ meter}$
- $1 \text{ joule} = 1 \text{ newton} + 1 \text{ meter}$
- $1 \text{ joule} = 1 \text{ newton} \cdot 1 \text{ meter}$ (correct)
- $1 \text{ joule} = 1 \text{ newton} - 1 \text{ meter}$
Given that power is measured in watts (W), which of the following equations correctly relates the watt to the joule (J) and second (s)?
Given that power is measured in watts (W), which of the following equations correctly relates the watt to the joule (J) and second (s)?
How is thermal energy mathematically expressed?
How is thermal energy mathematically expressed?
During energy conversion processes, what generally happens to the total amount of energy in a closed system?
During energy conversion processes, what generally happens to the total amount of energy in a closed system?
In the context of predicting energy consumption, what is the significance of using 'a priori' models, and what is an alternative approach?
In the context of predicting energy consumption, what is the significance of using 'a priori' models, and what is an alternative approach?
In a linear regression model for predicting peak electricity demand based on high temperature, how would you interpret the parameter $\theta_1$ if the model is described as predicted peak demand $ = \theta_1 \cdot \text{high temperature} + \theta_2$?
In a linear regression model for predicting peak electricity demand based on high temperature, how would you interpret the parameter $\theta_1$ if the model is described as predicted peak demand $ = \theta_1 \cdot \text{high temperature} + \theta_2$?
Given a linear regression model where the predicted peak demand is calculated by $predicted ; peak ; demand = \theta_1 \cdot (high ; temperature) + \theta_2$, and the parameters are $\theta_1 = 0.046$ and $\theta_2 = -1.46$, what is the predicted peak demand if the high temperature is $80^\circ F$?
Given a linear regression model where the predicted peak demand is calculated by $predicted ; peak ; demand = \theta_1 \cdot (high ; temperature) + \theta_2$, and the parameters are $\theta_1 = 0.046$ and $\theta_2 = -1.46$, what is the predicted peak demand if the high temperature is $80^\circ F$?
In the formal problem setting for predicting peak demand, if $x_i \in R^1$ represents the input for the $i$-th day, what does this input typically signify?
In the formal problem setting for predicting peak demand, if $x_i \in R^1$ represents the input for the $i$-th day, what does this input typically signify?
In the context of linear regression, what is the purpose of defining a feature vector $\phi(x_i)$?
In the context of linear regression, what is the purpose of defining a feature vector $\phi(x_i)$?
What does a loss function quantify in the context of machine learning?
What does a loss function quantify in the context of machine learning?
What is the key goal when finding model parameters in machine learning?
What is the key goal when finding model parameters in machine learning?
What does the term 'least-squares' refer to in the context of linear regression?
What does the term 'least-squares' refer to in the context of linear regression?
In the equation $J(\theta) = |\Phi \theta - y|^2_2$, what does the term $| \cdot |_2$ represent?
In the equation $J(\theta) = |\Phi \theta - y|^2_2$, what does the term $| \cdot |_2$ represent?
What does the gradient of a function, denoted as $\nabla_\theta J(\theta)$, represent?
What does the gradient of a function, denoted as $\nabla_\theta J(\theta)$, represent?
What is the 'normal equation' in the context of linear regression, and why is it important?
What is the 'normal equation' in the context of linear regression, and why is it important?
In the normal equations for linear regression, given by $\theta^* = (\Phi^T \Phi)^{-1} \Phi^T y$, what does the term $\Phi$ represent?
In the normal equations for linear regression, given by $\theta^* = (\Phi^T \Phi)^{-1} \Phi^T y$, what does the term $\Phi$ represent?
What is a key distinction between 'convex' and 'non-convex' optimization problems?
What is a key distinction between 'convex' and 'non-convex' optimization problems?
In the context of optimization, what does a 'constrained optimization problem' involve?
In the context of optimization, what does a 'constrained optimization problem' involve?
What is the purpose of transforming an optimization problem into its 'standard form' when solving Linear Programs (LPs)?
What is the purpose of transforming an optimization problem into its 'standard form' when solving Linear Programs (LPs)?
Other than the least-squares loss function, what are some alternative loss functions that can quantify the error between predicted and actual values?
Other than the least-squares loss function, what are some alternative loss functions that can quantify the error between predicted and actual values?
What is a core difference between using a squared loss function versus an absolute loss function in linear regression regarding sensitivity to outliers?
What is a core difference between using a squared loss function versus an absolute loss function in linear regression regarding sensitivity to outliers?
Why might one choose to use a 'deadband loss' function over a squared loss or absolute loss function in the context of linear regression?
Why might one choose to use a 'deadband loss' function over a squared loss or absolute loss function in the context of linear regression?
When using higher-dimensional inputs (e.g., both temperature and hour of day) in a linear regression model, how does the feature vector $\phi(x)$ change?
When using higher-dimensional inputs (e.g., both temperature and hour of day) in a linear regression model, how does the feature vector $\phi(x)$ change?
Considering a scenario where the input features for predicting electricity demand include both temperature and hour of day, how does increasing the dimensionality of the input potentially improve the model's accuracy?
Considering a scenario where the input features for predicting electricity demand include both temperature and hour of day, how does increasing the dimensionality of the input potentially improve the model's accuracy?
When expanding a linear regression model to include additional input features, such as both temperature and hour of day, what adjustments are typically required in the normal equations used to solve for the model parameters?
When expanding a linear regression model to include additional input features, such as both temperature and hour of day, what adjustments are typically required in the normal equations used to solve for the model parameters?
If the absolute loss function is non-differentiable, what approach is typically used to still solve for the parameters that minimize this?
If the absolute loss function is non-differentiable, what approach is typically used to still solve for the parameters that minimize this?
To minimize absolute loss $ \sum_{i=1}^m |\theta^T \phi(x_i) - y_i | $, new variables $v$ are introduced such that $v_i \geq |\theta^T \phi(x_i) - y_i |$. Consequently, linear constraints are added. What form do these constraints take?
To minimize absolute loss $ \sum_{i=1}^m |\theta^T \phi(x_i) - y_i | $, new variables $v$ are introduced such that $v_i \geq |\theta^T \phi(x_i) - y_i |$. Consequently, linear constraints are added. What form do these constraints take?
In the context of least squares optimization, changing the objective function from squared loss to absolute loss results in the problem being formulated as what?
In the context of least squares optimization, changing the objective function from squared loss to absolute loss results in the problem being formulated as what?
Flashcards
Define Energy
Define Energy
The ability to do work, applying force through a distance.
Unit of Energy
Unit of Energy
Joule (J), also BTU or kilowatt hour.
What is Power?
What is Power?
A rate of energy use.
Unit of Power
Unit of Power
Signup and view all the flashcards
Linear Model
Linear Model
Signup and view all the flashcards
Model Parameters
Model Parameters
Signup and view all the flashcards
Predicted Output
Predicted Output
Signup and view all the flashcards
Feature Vectors
Feature Vectors
Signup and view all the flashcards
What is a Loss Function?
What is a Loss Function?
Signup and view all the flashcards
Optimization Goal
Optimization Goal
Signup and view all the flashcards
Least-Squares Function
Least-Squares Function
Signup and view all the flashcards
Optimizing a Function
Optimizing a Function
Signup and view all the flashcards
What is a Gradient?
What is a Gradient?
Signup and view all the flashcards
Gradient Equation
Gradient Equation
Signup and view all the flashcards
Constrained Optimization
Constrained Optimization
Signup and view all the flashcards
Convex Optimization
Convex Optimization
Signup and view all the flashcards
Absolute Loss
Absolute Loss
Signup and view all the flashcards
Deadband Loss
Deadband Loss
Signup and view all the flashcards
Constrained Optimization
Constrained Optimization
Signup and view all the flashcards
Linear Constraints
Linear Constraints
Signup and view all the flashcards
Study Notes
- Linear regression is a machine learning technique.
- The task is to predict how much energy one will consumes tomorrow.
- This is difficult to estimate from "a priori" models, but lots of data can be used to build a model.
Energy 101
- Energy is defined as the "ability to do work," which involves applying force through a distance.
- The unit of energy is the joule (J), but btu and kilowatt hour can also be used
- 1 joule = 1 newton * 1 meter = (1 kilogram * 1 meter^2) / (1 second^2)
- Power is a rate of energy use.
- The unit of power is the watt (W), with common multiples like kilo/mega/gigawatt.
- 1 watt equals 1 joule / 1 second.
Forms of Energy
- Mechanical kinetic energy: E = (1/2)mv^2
- Gravitational potential energy: E = mgh
- Thermal energy: E = (3/2)NkT
- Electrical energy: E = VQ
- Electromagnetic energy: E = hf
- Chemical energy exists
- Nuclear energy: E = mc^2
Energy Conversion
- Conversions can occur between Gravitational Potential, Electrical, Electromagnetic, Mechanical Kinetic, Thermal, Chemical, and Nuclear energy.
Electricity Consumption
- Electricity consumption varies by month and hour of day based on data regarding Duquesne Light
- Duquesne light Covers Allegheny and Beaver Counties
- Duquesne Light has a service area of 817 square miles.
- Total customers: 584,000 (Approximately 90% residential customers)
- Contains 45,000 miles of Overhead Power Lines
- Number of Utility Poles is 250,000
- Number of Transformers is 103,000
Predict peak demand
- Peak demand can be predicted from temperature
A Simple Model
- A linear model can calculates predicted peak demand from high temperature:
- Predicted peak demand is equal to θ1 * (high temperature) + θ2
- Model parameters can be specified as: θ1, θ2 ∈ R (θ1 = 0.046, θ2 = -1.46)
- We can use a model like this to make predictions
- For example, given high temperature of 80°F, the predicted peak demand is calculated by: θ1 * 80 + θ2 = 0.046 * 80 - 1.46 = 2.19 GW
Formal Problem Setting
- Input: xi ∈ Rn, i = 1,..., m
- E.g.: xi ∈ R1 = {high temperature for day i}
- Output: yi ∈ R (regression task)
- E.g.: yi ∈ R = {peak demand for day i}
- Model Parameters: θ ∈ Rk
- Predicted Output: ŷi ∈ R
- E.g.: ŷi = θ1 * xi + θ2
Feature Vectors
- Define a function that maps inputs to feature vectors: φ: Rn -> Rk
- Task above:
- φ(xi) = [xi, 1] (where n = 1, k = 2)
- ŷi = Σ θj * φj(xi) ≡ θTφ(xi)
Loss functions
- Want a model that performs "well" on the data we have.
- ŷi ≈ yi, ∀i
- Measure "closeness" of Å·i and yi using loss function: l: R x R -> R+
- Example: squared loss
- l(ŷi, yi) = (ŷi – yi)^2
Finding Model Parameters, and Optimization
- Objective: find model parameters that minimize sum of costs over all input/output pairs
- J(θ) = Σ l(ŷi, yi) = Σ (θTφ(xi) – yi)^2
- Minimize J(θ)
Matrix Notation
-
Φ ∈ Rm×k
-
y ∈ Rm
-
Then
-
J(θ) = Σ (θTφ(xi) – yi)^2 = ||Φθ – y||2
-
(||z||2 is l2 norm of a vector: ||z||2 = √Σ zi^2 = √zTz)
Least Squares Objective Function
- Least-squares objective function
- How do we optimize a function?
1-D Case (θ ∈ R)
- J(θ) = θ^2 – 2θ – 1
- dJ/dθ = 2θ - 2
- θ* minimum <=> dJ/dθ|θ* = 0
- <=> 2θ* - 2 = 0
- <=> θ* = 1
Multi-Variate Case
- θ ∈ Rk, J: Rk -> R
- Generalized condition: ∇θJ(θ)|θ* = 0
- ∇θJ(θ) denotes gradient of J with respect to θ
- Some important rules and common gradient
- ∇θ(af(θ) + bg(θ)) = a∇θf(θ) + b∇θg(θ), (a, b ∈ R)
- ∇θ(θTATθ) = (A + AT)θ, (A ∈ Rk×k)
- ∇θ(bTθ) = b, (b ∈ Rk)
Optimizing Least-Squares Objective
- J(θ) = ||Φθ – y||2
- = (Φθ – y)T(Φθ – y)
- = θTΦTΦθ – 2yTΦθ + yTy
- Using the previous gradient rules
- ∇θJ(θ) = ∇θ(θTΦTΦθ – 2yTΦθ + yTy)
- = ∇θ(θTΦTΦθ) – 2∇θ(yTΦθ) + ∇θ(yTy)
- = 2ΦTΦθ – 2ΦTy
- Setting gradient equal to zero
- 2ΦTΦθ* – 2ΦTy = 0 <=> θ* = (ΦTΦ)-1ΦTy
- known as the normal equations
MATLAB Code
- X = load('high_temperature.txt');
- y = load('peak_demand.txt');
- n = size(X,2);
- m = size(X,1);
- Phi = [X ones(m,1)];
- theta = inv(Phi' * Phi) * Phi' * y;
- theta =
- 0.0466
- -1.4600
- The normal equations are so common that MATLAB has a special operation for them
- % same as inv(Phi' * Phi) * Phi' * y
- theta = Phi \ y;
General Optimization Problems
minimize J(θ)
- θ
- subject to gi(θ) ≤ 0, i = 1,..., Ni
- hi(θ) = 0, i = 1,..., Ne
- A constrained optimization problem; gi terms are the inequality constraints; hi terms are the equality constraints.
Types of Optimization Problems
- Linear programming.
- Quadratic programming.
- Semidefinite programming.
- Integer programming.
- The distinction depends on the form of J, the gi's, and the hi's.
Convex and Non-Convex Problems
- Important distinctions in optimization are between convex (where J, gi are convex and hi linear) and non-convex problems
- f convex <=> f(aθ + (1 – a)θ') ≤ af(θ) + (1 – a)f(θ')
- for 0 ≤ a ≤ 1
- Informally speaking, one can usually find global solutions of convex problems efficiently. For non-convex problems one must settle for local solutions or time-consuming optimization.
Solving Optimization Problems
- Many generic optimization libraries exist
- YALMIP (Yet Another Linear Matrix Inequality Parser) can be used. URL: http://users.isy.liu.se/johanl/yalmip/
YALMIP Code for Least Squares Optimization:
- theta = sdpvar(n,1);
- solvesdp([], sum((Phi*theta - y).^2));
- double(theta)
- ans =
- 0.0466
- -1.4600
Alternative Loss Functions
- There is nothing special about the least-squares loss function
- l(ŷ, y) = (ŷ – y)^2.
- Some alternatives include:
- Absolute loss: l(Å·, y) = |Å· - y|
- Deadband loss: l(ŷ, y) = max{0, |ŷ – y| – ε}, ε ∈ R+
Minimizing Absolute Loss
- Minimize Σ |θTφ(xi) – yi|
- Not differentiable
- Frame as a constrained optimization problem
- Minimize Σ vi, v ∈ Rm
- Subject to –vi ≤ θTφ(xi) – yi ≤ vi
- Linear program (LP): linear object and linear constraints
Standard Form for Solving LPs
- minimize cTz
- subject to Az ≤ b
- z ∈ Rn, A ∈ RNi×n, b ∈ RNi
- For absolute loss LP
- z = [θ/ν], c = [0/1], A = [Ф -I/-Ф -I], b = [y/-y]
MATLAB code
- c = [zeros(n,1); ones(m,1)];
- A = [Phi -eye(m); -Phi -eye(m)];
- b = [y; -y];
- z = linprog(c,A,b);
- theta = z(1:n)
- theta =
- 0.0477
- -1.5978
- The same solution in YALMIP:
- theta = sdpvar(n,1);
- solvesdp([], sum(abs((Phi*theta - y))));
- double(theta)
- theta =
- 0.0477
- -1.5978
Which Loss Function to Use?
- Graph of observed data, squared loss, absolute loss, and deadband loss.
Higher Dimensional Inputs
- Input: x ∈ R2 = [temperature/hour of day]
- Output: y ∈ R = demand
Features
- ϕ(x) ∈ R3 = [temperature/hour of day/1]
- Previous matrices remain the same
- ϕ ∈ Rmxk
- y ∈ Rm
- Same solution as before
- θ ∈ R3 = (ΦTΦ)-1ΦTy
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.