Energy Consumption Prediction

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

If energy is defined as the ability to do work, what best describes the relationship between force and distance in this context?

  • Energy is the inverse of force applied over a distance.
  • Energy is unrelated to force and distance within the constraints of doing work.
  • Energy is the force applied through a distance. (correct)
  • Energy is equivalent to distance divided by force

How is power defined in relation to energy?

  • Power is the rate at which energy is used or transferred. (correct)
  • Power is the capacity to store energy.
  • Power is energy multiplied by time.
  • Power is the total amount of energy available.

Which of the following equations correctly represents the relationship between the joule (J), newton (N), and meter (m)?

  • $1 \text{ joule} = 1 \text{ newton} / 1 \text{ meter}$
  • $1 \text{ joule} = 1 \text{ newton} + 1 \text{ meter}$
  • $1 \text{ joule} = 1 \text{ newton} \cdot 1 \text{ meter}$ (correct)
  • $1 \text{ joule} = 1 \text{ newton} - 1 \text{ meter}$

Given that power is measured in watts (W), which of the following equations correctly relates the watt to the joule (J) and second (s)?

<p>$1 \text{ watt} = 1 \text{ joule} / 1 \text{ second}$ (A)</p> Signup and view all the answers

How is thermal energy mathematically expressed?

<p>$E = \frac{3}{2}NkT$ (D)</p> Signup and view all the answers

During energy conversion processes, what generally happens to the total amount of energy in a closed system?

<p>The total energy remains constant, although its form may change. (D)</p> Signup and view all the answers

In the context of predicting energy consumption, what is the significance of using 'a priori' models, and what is an alternative approach?

<p>'A priori' models are difficult to estimate from, so using data to make a model is better. (A)</p> Signup and view all the answers

In a linear regression model for predicting peak electricity demand based on high temperature, how would you interpret the parameter $\theta_1$ if the model is described as predicted peak demand $ = \theta_1 \cdot \text{high temperature} + \theta_2$?

<p>$\theta_1$ quantifies the rate of change in peak demand for each degree change in temperature. (C)</p> Signup and view all the answers

Given a linear regression model where the predicted peak demand is calculated by $predicted ; peak ; demand = \theta_1 \cdot (high ; temperature) + \theta_2$, and the parameters are $\theta_1 = 0.046$ and $\theta_2 = -1.46$, what is the predicted peak demand if the high temperature is $80^\circ F$?

<p>2.19 GW (C)</p> Signup and view all the answers

In the formal problem setting for predicting peak demand, if $x_i \in R^1$ represents the input for the $i$-th day, what does this input typically signify?

<p>The high temperature for day <em>i</em> (A)</p> Signup and view all the answers

In the context of linear regression, what is the purpose of defining a feature vector $\phi(x_i)$?

<p>To map the inputs into a higher-dimensional space where the relationship with the output might be linear. (A)</p> Signup and view all the answers

What does a loss function quantify in the context of machine learning?

<p>The difference between predicted and actual values. (A)</p> Signup and view all the answers

What is the key goal when finding model parameters in machine learning?

<p>To minimize the sum of costs over all input/output pairs. (B)</p> Signup and view all the answers

What does the term 'least-squares' refer to in the context of linear regression?

<p>Minimizing the sum of the squares of the differences between predicted and actual values. (C)</p> Signup and view all the answers

In the equation $J(\theta) = |\Phi \theta - y|^2_2$, what does the term $| \cdot |_2$ represent?

<p>The l2 norm of a vector. (B)</p> Signup and view all the answers

What does the gradient of a function, denoted as $\nabla_\theta J(\theta)$, represent?

<p>The rate of change of the function with respect to $\theta$. (B)</p> Signup and view all the answers

What is the 'normal equation' in the context of linear regression, and why is it important?

<p>It is a closed-form solution that directly computes the optimal parameters for linear regression. (C)</p> Signup and view all the answers

In the normal equations for linear regression, given by $\theta^* = (\Phi^T \Phi)^{-1} \Phi^T y$, what does the term $\Phi$ represent?

<p>The matrix of feature vectors. (B)</p> Signup and view all the answers

What is a key distinction between 'convex' and 'non-convex' optimization problems?

<p>Convex problems guarantee that any local minimum found is also a global minimum. (D)</p> Signup and view all the answers

In the context of optimization, what does a 'constrained optimization problem' involve?

<p>Minimizing or maximizing an objective function subject to certain equality and inequality constraints. (B)</p> Signup and view all the answers

What is the purpose of transforming an optimization problem into its 'standard form' when solving Linear Programs (LPs)?

<p>To make the problem compatible with standard solvers that expect a specific format. (B)</p> Signup and view all the answers

Other than the least-squares loss function, what are some alternative loss functions that can quantify the error between predicted and actual values?

<p>Absolute loss, deadband loss (B)</p> Signup and view all the answers

What is a core difference between using a squared loss function versus an absolute loss function in linear regression regarding sensitivity to outliers?

<p>Squared loss is more sensitive to outliers than absolute loss. (A)</p> Signup and view all the answers

Why might one choose to use a 'deadband loss' function over a squared loss or absolute loss function in the context of linear regression?

<p>When small errors are tolerable and should not contribute to the loss. (A)</p> Signup and view all the answers

When using higher-dimensional inputs (e.g., both temperature and hour of day) in a linear regression model, how does the feature vector $\phi(x)$ change?

<p>It becomes a vector that includes each input variable, allowing the model to consider their individual effects. (C)</p> Signup and view all the answers

Considering a scenario where the input features for predicting electricity demand include both temperature and hour of day, how does increasing the dimensionality of the input potentially improve the model's accuracy?

<p>By capturing more complex relationships between the predictors and the outcome variable, leading to a more nuanced prediction. (C)</p> Signup and view all the answers

When expanding a linear regression model to include additional input features, such as both temperature and hour of day, what adjustments are typically required in the normal equations used to solve for the model parameters?

<p>The feature matrix $\Phi$ must be redefined to include the additional features. (B)</p> Signup and view all the answers

If the absolute loss function is non-differentiable, what approach is typically used to still solve for the parameters that minimize this?

<p>Frame the problem as a constrained optimization problem. (D)</p> Signup and view all the answers

To minimize absolute loss $ \sum_{i=1}^m |\theta^T \phi(x_i) - y_i | $, new variables $v$ are introduced such that $v_i \geq |\theta^T \phi(x_i) - y_i |$. Consequently, linear constraints are added. What form do these constraints take?

<p>$v_i - \theta^T \phi(x_i) + y_i \geq 0$ and $v_i + \theta^T \phi(x_i) - y_i \geq 0$ (D)</p> Signup and view all the answers

In the context of least squares optimization, changing the objective function from squared loss to absolute loss results in the problem being formulated as what?

<p>Linear programming. (C)</p> Signup and view all the answers

Flashcards

Define Energy

The ability to do work, applying force through a distance.

Unit of Energy

Joule (J), also BTU or kilowatt hour.

What is Power?

A rate of energy use.

Unit of Power

Watt (W), often kilo/mega/gigawatt.

Signup and view all the flashcards

Linear Model

A linear equation used to predict outcomes.

Signup and view all the flashcards

Model Parameters

Values that define the model's behavior.

Signup and view all the flashcards

Predicted Output

The predicted output, based on the model.

Signup and view all the flashcards

Feature Vectors

A function mapping inputs to feature vectors.

Signup and view all the flashcards

What is a Loss Function?

A function measuring the difference between predicted and actual values.

Signup and view all the flashcards

Optimization Goal

Find model parameters minimizing the total cost.

Signup and view all the flashcards

Least-Squares Function

A common objective function, often used due to simple math.

Signup and view all the flashcards

Optimizing a Function

A condition where the gradient of a function equals zero.

Signup and view all the flashcards

What is a Gradient?

Vector of partial derivatives of a function.

Signup and view all the flashcards

Gradient Equation

Setting the gradient to zero to find the optimal parameters.

Signup and view all the flashcards

Constrained Optimization

Optimization with constraints on the variables.

Signup and view all the flashcards

Convex Optimization

Optimization where the function/constraints are convex.

Signup and view all the flashcards

Absolute Loss

Loss = absolute difference between predicted and actual.

Signup and view all the flashcards

Deadband Loss

Loss = zero if error is within a margin, else absolute difference.

Signup and view all the flashcards

Constrained Optimization

Solve optimization by framing it as a linear progrma.

Signup and view all the flashcards

Linear Constraints

Linear relationships of model parameters.

Signup and view all the flashcards

Study Notes

  • Linear regression is a machine learning technique.
  • The task is to predict how much energy one will consumes tomorrow.
  • This is difficult to estimate from "a priori" models, but lots of data can be used to build a model.

Energy 101

  • Energy is defined as the "ability to do work," which involves applying force through a distance.
  • The unit of energy is the joule (J), but btu and kilowatt hour can also be used
  • 1 joule = 1 newton * 1 meter = (1 kilogram * 1 meter^2) / (1 second^2)
  • Power is a rate of energy use.
  • The unit of power is the watt (W), with common multiples like kilo/mega/gigawatt.
  • 1 watt equals 1 joule / 1 second.

Forms of Energy

  • Mechanical kinetic energy: E = (1/2)mv^2
  • Gravitational potential energy: E = mgh
  • Thermal energy: E = (3/2)NkT
  • Electrical energy: E = VQ
  • Electromagnetic energy: E = hf
  • Chemical energy exists
  • Nuclear energy: E = mc^2

Energy Conversion

  • Conversions can occur between Gravitational Potential, Electrical, Electromagnetic, Mechanical Kinetic, Thermal, Chemical, and Nuclear energy.

Electricity Consumption

  • Electricity consumption varies by month and hour of day based on data regarding Duquesne Light
  • Duquesne light Covers Allegheny and Beaver Counties
  • Duquesne Light has a service area of 817 square miles.
  • Total customers: 584,000 (Approximately 90% residential customers)
  • Contains 45,000 miles of Overhead Power Lines
  • Number of Utility Poles is 250,000
  • Number of Transformers is 103,000

Predict peak demand

  • Peak demand can be predicted from temperature

A Simple Model

  • A linear model can calculates predicted peak demand from high temperature:
  • Predicted peak demand is equal to θ1 * (high temperature) + θ2
  • Model parameters can be specified as: θ1, θ2 ∈ R (θ1 = 0.046, θ2 = -1.46)
  • We can use a model like this to make predictions
  • For example, given high temperature of 80°F, the predicted peak demand is calculated by: θ1 * 80 + θ2 = 0.046 * 80 - 1.46 = 2.19 GW

Formal Problem Setting

  • Input: xi ∈ Rn, i = 1,..., m
  • E.g.: xi ∈ R1 = {high temperature for day i}
  • Output: yi ∈ R (regression task)
  • E.g.: yi ∈ R = {peak demand for day i}
  • Model Parameters: θ ∈ Rk
  • Predicted Output: Å·i ∈ R
  • E.g.: Å·i = θ1 * xi + θ2

Feature Vectors

  • Define a function that maps inputs to feature vectors: φ: Rn -> Rk
  • Task above:
  • φ(xi) = [xi, 1] (where n = 1, k = 2)
  • Å·i = Σ θj * φj(xi) ≡ θTφ(xi)

Loss functions

  • Want a model that performs "well" on the data we have.
  • Å·i ≈ yi, ∀i
  • Measure "closeness" of Å·i and yi using loss function: l: R x R -> R+
  • Example: squared loss
  • l(Å·i, yi) = (Å·i – yi)^2

Finding Model Parameters, and Optimization

  • Objective: find model parameters that minimize sum of costs over all input/output pairs
  • J(θ) = Σ l(Å·i, yi) = Σ (θTφ(xi) – yi)^2
  • Minimize J(θ)

Matrix Notation

  • Φ ∈ Rm×k

  • y ∈ Rm

  • Then

  • J(θ) = Σ (θTφ(xi) – yi)^2 = ||Φθ – y||2

  • (||z||2 is l2 norm of a vector: ||z||2 = √Σ zi^2 = √zTz)

Least Squares Objective Function

  • Least-squares objective function
  • How do we optimize a function?

1-D Case (θ ∈ R)

  • J(θ) = θ^2 – 2θ – 1
  • dJ/dθ = 2θ - 2
  • θ* minimum <=> dJ/dθ|θ* = 0
  • <=> 2θ* - 2 = 0
  • <=> θ* = 1

Multi-Variate Case

  • θ ∈ Rk, J: Rk -> R
  • Generalized condition: ∇θJ(θ)|θ* = 0
  • ∇θJ(θ) denotes gradient of J with respect to θ
  • Some important rules and common gradient
  • ∇θ(af(θ) + bg(θ)) = a∇θf(θ) + b∇θg(θ), (a, b ∈ R)
  • ∇θ(θTATθ) = (A + AT)θ, (A ∈ Rk×k)
  • ∇θ(bTθ) = b, (b ∈ Rk)

Optimizing Least-Squares Objective

  • J(θ) = ||Φθ – y||2
  • = (Φθ – y)T(Φθ – y)
  • = θTΦTΦθ – 2yTΦθ + yTy
  • Using the previous gradient rules
  • ∇θJ(θ) = ∇θ(θTΦTΦθ – 2yTΦθ + yTy)
  • = ∇θ(θTΦTΦθ) – 2∇θ(yTΦθ) + ∇θ(yTy)
  • = 2ΦTΦθ – 2ΦTy
  • Setting gradient equal to zero
  • 2ΦTΦθ* – 2ΦTy = 0 <=> θ* = (ΦTΦ)-1ΦTy
  • known as the normal equations

MATLAB Code

  • X = load('high_temperature.txt');
  • y = load('peak_demand.txt');
  • n = size(X,2);
  • m = size(X,1);
  • Phi = [X ones(m,1)];
  • theta = inv(Phi' * Phi) * Phi' * y;
  • theta =
  • 0.0466
  • -1.4600
  • The normal equations are so common that MATLAB has a special operation for them
  • % same as inv(Phi' * Phi) * Phi' * y
  • theta = Phi \ y;

General Optimization Problems

minimize J(θ)

  • θ
  • subject to gi(θ) ≤ 0, i = 1,..., Ni
  • hi(θ) = 0, i = 1,..., Ne
  • A constrained optimization problem; gi terms are the inequality constraints; hi terms are the equality constraints.

Types of Optimization Problems

  • Linear programming.
  • Quadratic programming.
  • Semidefinite programming.
  • Integer programming.
  • The distinction depends on the form of J, the gi's, and the hi's.

Convex and Non-Convex Problems

  • Important distinctions in optimization are between convex (where J, gi are convex and hi linear) and non-convex problems
  • f convex <=> f(aθ + (1 – a)θ') ≤ af(θ) + (1 – a)f(θ')
  • for 0 ≤ a ≤ 1
  • Informally speaking, one can usually find global solutions of convex problems efficiently. For non-convex problems one must settle for local solutions or time-consuming optimization.

Solving Optimization Problems

YALMIP Code for Least Squares Optimization:

  • theta = sdpvar(n,1);
  • solvesdp([], sum((Phi*theta - y).^2));
  • double(theta)
  • ans =
  • 0.0466
  • -1.4600

Alternative Loss Functions

  • There is nothing special about the least-squares loss function
  • l(Å·, y) = (Å· – y)^2.
  • Some alternatives include:
  • Absolute loss: l(Å·, y) = |Å· - y|
  • Deadband loss: l(Å·, y) = max{0, |Å· – y| – ε}, ε ∈ R+

Minimizing Absolute Loss

  • Minimize Σ |θTφ(xi) – yi|
  • Not differentiable
  • Frame as a constrained optimization problem
  • Minimize Σ vi, v ∈ Rm
  • Subject to –vi ≤ θTφ(xi) – yi ≤ vi
  • Linear program (LP): linear object and linear constraints

Standard Form for Solving LPs

  • minimize cTz
  • subject to Az ≤ b
  • z ∈ Rn, A ∈ RNi×n, b ∈ RNi
  • For absolute loss LP
  • z = [θ/ν], c = [0/1], A = [Ф -I/-Ф -I], b = [y/-y]

MATLAB code

  • c = [zeros(n,1); ones(m,1)];
  • A = [Phi -eye(m); -Phi -eye(m)];
  • b = [y; -y];
  • z = linprog(c,A,b);
  • theta = z(1:n)
  • theta =
  • 0.0477
  • -1.5978
  • The same solution in YALMIP:
  • theta = sdpvar(n,1);
  • solvesdp([], sum(abs((Phi*theta - y))));
  • double(theta)
  • theta =
  • 0.0477
  • -1.5978

Which Loss Function to Use?

  • Graph of observed data, squared loss, absolute loss, and deadband loss.

Higher Dimensional Inputs

  • Input: x ∈ R2 = [temperature/hour of day]
  • Output: y ∈ R = demand

Features

  • Ï•(x) ∈ R3 = [temperature/hour of day/1]
  • Previous matrices remain the same
  • Ï• ∈ Rmxk
  • y ∈ Rm
  • Same solution as before
  • θ ∈ R3 = (ΦTΦ)-1ΦTy

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser