Podcast
Questions and Answers
Which of the following is a key characteristic of a 'fair' die in the context of probability?
Which of the following is a key characteristic of a 'fair' die in the context of probability?
- The numbers on the die are chosen randomly without sequence.
- It has been used in several previous rolls, influencing future outcomes.
- Each of its faces has an equal probability of landing face up. (correct)
- Its sides are made of different materials, affecting its balance.
How are discrete random variables different from continuous random variables?
How are discrete random variables different from continuous random variables?
- Discrete variables are associated with measurable quantities, while continuous variables represent countable entities.
- Discrete variables can assume any value within a given range, while continuous variables can only take specific, predetermined values.
- Discrete random variables must be integers, while continuous random variables can be fractions.
- Discrete variables have a finite or countable set of possible values, while continuous variables can take on a continuum of values within a range. (correct)
Given a random variable X representing the sum of two dice rolls, and another random variable Y representing the maximum of the same two dice rolls, how would you describe the relationship between the event spaces of X and Y?
Given a random variable X representing the sum of two dice rolls, and another random variable Y representing the maximum of the same two dice rolls, how would you describe the relationship between the event spaces of X and Y?
- The event spaces for X and Y are identical because they are derived from the same sample space.
- The event space for X is a subset of the event space for Y, because the sum will always be greater or equal to the max.
- The event space for Y is a subset of the event space for X, because the max will always be less than or equal to the sum.
- The event spaces for X and Y are different, as they represent different functions applied to the same original space of outcomes. (correct)
What is the primary goal of creating a probability distribution for a random variable?
What is the primary goal of creating a probability distribution for a random variable?
If events A and B are labeled as 'independent', what does this imply about their probabilities?
If events A and B are labeled as 'independent', what does this imply about their probabilities?
A factory has two machines: Machine A produces 100 items with 5% defective, and Machine B produces 150 items with 8% defective. If one item is chosen at random from each machine, what additional information is needed to calculate the probability that both items are non-defective?
A factory has two machines: Machine A produces 100 items with 5% defective, and Machine B produces 150 items with 8% defective. If one item is chosen at random from each machine, what additional information is needed to calculate the probability that both items are non-defective?
What is the primary purpose of a joint probability distribution involving two random variables?
What is the primary purpose of a joint probability distribution involving two random variables?
How does calculating the marginal probability from a joint probability distribution simplify analysis?
How does calculating the marginal probability from a joint probability distribution simplify analysis?
Which of the following statements accurately describes the relationship between probability mass function (PMF) and cumulative distribution function (CDF)?
Which of the following statements accurately describes the relationship between probability mass function (PMF) and cumulative distribution function (CDF)?
What characteristic differentiates the use of standard deviation from variance?
What characteristic differentiates the use of standard deviation from variance?
Why is it important to normalize covariance when comparing the relationship between variables in different datasets?
Why is it important to normalize covariance when comparing the relationship between variables in different datasets?
What does a high standard deviation signify in the context of a dataset?
What does a high standard deviation signify in the context of a dataset?
Why is correlation more often used than covariance when comparing the relationships between variables across machine learning domains?
Why is correlation more often used than covariance when comparing the relationships between variables across machine learning domains?
What is the significance of the expected value (mean) of a random variable?
What is the significance of the expected value (mean) of a random variable?
How would you describe covariance?
How would you describe covariance?
Given two independent events A and B, where P(A) = 0.6 and P(B) = 0.4, what is the probability of both A and B occurring?
Given two independent events A and B, where P(A) = 0.6 and P(B) = 0.4, what is the probability of both A and B occurring?
In the context of random variables, what does the term 'outcome' refer to?
In the context of random variables, what does the term 'outcome' refer to?
How does knowing the probability distribution of a customer's taste for commodities benefit a shop manager?
How does knowing the probability distribution of a customer's taste for commodities benefit a shop manager?
In the context of random variables, what does 'PMF' mean?
In the context of random variables, what does 'PMF' mean?
Given that events A and B are mutually exclusive, what can be said about whether they are independent?
Given that events A and B are mutually exclusive, what can be said about whether they are independent?
In a study, event A is 'a person smokes' and event B is 'a person develops lung cancer'. If these events are independent, what is true?
In a study, event A is 'a person smokes' and event B is 'a person develops lung cancer'. If these events are independent, what is true?
What is the benefit of finding the correlation between variables?
What is the benefit of finding the correlation between variables?
You have a six-sided die that is weighted such that the probability of rolling a 6 is twice as high as any other number, what is the probability of rolling a 6?
You have a six-sided die that is weighted such that the probability of rolling a 6 is twice as high as any other number, what is the probability of rolling a 6?
You have a jar of marbles, 3 blue, 2 green, and 5 red. What is the probability that you draw a blue marble, followed by a red marble without replacement?
You have a jar of marbles, 3 blue, 2 green, and 5 red. What is the probability that you draw a blue marble, followed by a red marble without replacement?
Consider rolling two dice. Let X be the event of rolling doubles, Y the event that the sum is greater than 7. What statement best describes the relationship between rolling doubles and rolling a sum greater than 7?
Consider rolling two dice. Let X be the event of rolling doubles, Y the event that the sum is greater than 7. What statement best describes the relationship between rolling doubles and rolling a sum greater than 7?
Event A is 'it rains tomorrow', and event B is 'the local baseball team wins their game tomorrow'. Are these two events typically independent or dependent, and why?
Event A is 'it rains tomorrow', and event B is 'the local baseball team wins their game tomorrow'. Are these two events typically independent or dependent, and why?
A coffee shop owner uses past sales data and finds that the mean daily coffee sales is 200 cups with a standard deviation of 20 cups. What does the standard deviation tell you about the daily sales?
A coffee shop owner uses past sales data and finds that the mean daily coffee sales is 200 cups with a standard deviation of 20 cups. What does the standard deviation tell you about the daily sales?
Which of the following is an example of discrete data?
Which of the following is an example of discrete data?
Machine A's item's weights are normally distributed with a mean of 50 grams and machine B's item's weights are normally distributed with a mean of 50 grams. Knowing only this, what additional information is needed to determine which machine demonstrates more consistency in its production weights?
Machine A's item's weights are normally distributed with a mean of 50 grams and machine B's item's weights are normally distributed with a mean of 50 grams. Knowing only this, what additional information is needed to determine which machine demonstrates more consistency in its production weights?
How does statistical independence simplify the calculation of joint probabilities?
How does statistical independence simplify the calculation of joint probabilities?
Which of the following is a discrete random variable?
Which of the following is a discrete random variable?
The number of dogs in each household of your neighborhood is a discrete random variable. You perform a survey to calculate the following probabilities: P(X=0) = 0.4, P(X=1) = 0.3, P(X=2) = 0.2. What is P(X>2)?
The number of dogs in each household of your neighborhood is a discrete random variable. You perform a survey to calculate the following probabilities: P(X=0) = 0.4, P(X=1) = 0.3, P(X=2) = 0.2. What is P(X>2)?
In an experiment involving the toss of a fair coin and the roll of a fair six-sided die, what constitutes the sample space?
In an experiment involving the toss of a fair coin and the roll of a fair six-sided die, what constitutes the sample space?
If the covariance between height and weight is positive, what does that generally indicate?
If the covariance between height and weight is positive, what does that generally indicate?
Based on the probability that the event will happen, what does 'weight' in weighted average refer to?
Based on the probability that the event will happen, what does 'weight' in weighted average refer to?
What best describes the relationship between correlation and dimension reduction?
What best describes the relationship between correlation and dimension reduction?
What is the difference in information that is communicated from variance to standard deviation?
What is the difference in information that is communicated from variance to standard deviation?
When is it useful to find the joint probability mass function (PMF)?
When is it useful to find the joint probability mass function (PMF)?
Which of the following is a discrete random variable?
Which of the following is a discrete random variable?
Consider the following two datasets:
Dataset A: 1, 2, 3, 4, 5
Dataset B: 5, 4, 3, 2, 1
What is the covariance between Dataset A and Dataset B?
Consider the following two datasets: Dataset A: 1, 2, 3, 4, 5 Dataset B: 5, 4, 3, 2, 1
What is the covariance between Dataset A and Dataset B?
What is the benefit of using correlation in machine learning training when dealing with high-dimensional data?
What is the benefit of using correlation in machine learning training when dealing with high-dimensional data?
Consider two random variables X and Y with joint PMF given by:
Y = 0 Y = 1 Y = 2
X = 0 1/6 1/4 1/8
X = 1 1/8 1/6 1/6
What is P(X=0, Y≤1)?
a) 5/12
b) 8/12
c) 9/24
d) 3/24
Consider two random variables X and Y with joint PMF given by:
Y = 0 Y = 1 Y = 2
X = 0 1/6 1/4 1/8 X = 1 1/8 1/6 1/6 What is P(X=0, Y≤1)? a) 5/12 b) 8/12 c) 9/24 d) 3/24
According to the joint PMF in Q42, what is P(Y=1 | X=0)?
According to the joint PMF in Q42, what is P(Y=1 | X=0)?
According to the joint PMF in Q4, what is the relation between X and Y?
According to the joint PMF in Q4, what is the relation between X and Y?
How do we decide which features to drop based on the correlation coefficient threshold?
How do we decide which features to drop based on the correlation coefficient threshold?
The duration of a heart operation is normally distributed with mean 170 minutes and standard deviation 14 minutes. What percentage of operations last between 142-198 minutes?
The duration of a heart operation is normally distributed with mean 170 minutes and standard deviation 14 minutes. What percentage of operations last between 142-198 minutes?
What happens to the variance of the dataset if each value is multiplied by 2?
What happens to the variance of the dataset if each value is multiplied by 2?
Why is correlation preferred over covariance for comparisons between variables across domains?
Why is correlation preferred over covariance for comparisons between variables across domains?
A correlation near +1 means:
A correlation near +1 means:
A random variable Y has the following PMF : P ( Y = 1 ) = 0.2 , P ( Y = 2 ) = 0.5 , P ( Y = 3 ) = 0.3. What is E [ Y ] ?
A random variable Y has the following PMF : P ( Y = 1 ) = 0.2 , P ( Y = 2 ) = 0.5 , P ( Y = 3 ) = 0.3. What is E [ Y ] ?
If Cov ( X , Y ) = 0, which statement is always true?
If Cov ( X , Y ) = 0, which statement is always true?
Dataset A: [1, 2, 3], Dataset B: [3, 2, 1]. What is their correlation coefficient?
Dataset A: [1, 2, 3], Dataset B: [3, 2, 1]. What is their correlation coefficient?
If all values in a dataset are multiplied by 3, what happens to the variance?
If all values in a dataset are multiplied by 3, what happens to the variance?
A normal distribution has mean μ = 50 and standard deviation σ=10. What is P ( 40 ≤ X ≤ 60 ) ?
A normal distribution has mean μ = 50 and standard deviation σ=10. What is P ( 40 ≤ X ≤ 60 ) ?
A coin is flipped 5 times. What is the probability of getting exactly 3 heads?
A coin is flipped 5 times. What is the probability of getting exactly 3 heads?
In a ML dataset, two features have a correlation coefficient of 0.9. What should you do?
In a ML dataset, two features have a correlation coefficient of 0.9. What should you do?
Flashcards
What is a Random Variable (RV)?
What is a Random Variable (RV)?
Variable whose outcome is random.
What is a Discrete Random Variable?
What is a Discrete Random Variable?
A random variable with a finite or countable set of possible values.
What is a Continuous Random Variable?
What is a Continuous Random Variable?
A random variable that takes on a continuum of possible values within an interval.
What is a probability distribution?
What is a probability distribution?
Signup and view all the flashcards
What is PMF? (probability mass function)
What is PMF? (probability mass function)
Signup and view all the flashcards
What is CDF? (cumulative distribution function)
What is CDF? (cumulative distribution function)
Signup and view all the flashcards
When are Random variables independent?
When are Random variables independent?
Signup and view all the flashcards
What is joint probability distribution?
What is joint probability distribution?
Signup and view all the flashcards
What is the expected value (mean)?
What is the expected value (mean)?
Signup and view all the flashcards
What is Variance?
What is Variance?
Signup and view all the flashcards
What is Standard deviation?
What is Standard deviation?
Signup and view all the flashcards
What is Covariance?
What is Covariance?
Signup and view all the flashcards
What is Correlation?
What is Correlation?
Signup and view all the flashcards
Study Notes
- Session 2 covers statistics, including random variables, mean, variance, standard deviation, covariance, and correlation.
- It also covers probability, including independence of events and joint vs. marginal probability.
Random Variables
- Probability is often linked to at least one event.
- The random variable represents the outcome of events.
- Outcomes of events are random
- Rolling a die or pulling a ball are toy examples of events
- It's often useful to know the likelihood of a random variable/event taking on a specific value.
- A "fair" die means each face has an equal chance of landing face up.
- Random variables can be discrete as a finite or countable sequence e.g. cards or dice.
- Random variables can be continuous and take on values in an interval: e.g. lifetime of a car, amount of water.
- Focus is on discrete random variables.
- Events/random variables can be any suitable function, including summation, difference, product, max, min, etc.
- The event space differs from the original outcome space.
- X(a,b) = a+b and Y(a,b) = max(a,b) are further examples of dice rolls
- Sx represents the set of values resulting from either definition
- Solving requires an understanding of r.v and P(r.v = ?)
Example
- The space S covers every possible outcome of dice rolls
- 36 outcomes are possible for 2 dice rolls
- For R.v. X(a,b) = a+b, Sx covers from 2 to 12
- For R.v. Y(a,b) = max(a,b) Sy covers from 1 to 6
- P(X=2) = P({(1,1)}) = 1/36
- P(X=3) = P({(1,2),(2,1)}) = 2/36
- P(X=4) = P({(3,1),(1,3),(2,2)}) = 3/36
- P(Y=1) = P({(1,1)}) = 1/36
- P(Y=2) = P({(2,1),(1,2),(2,2)}) = 3/36
- P(Y=3) = P({(1,3),(3,1),(3,2),(2,3),(3,3)}) = 5/36
Probability Distribution
- Likelihood of obtaining possible values is captured using a probability distribution
- Probability distribution has possible outcomes and statistical liklihood.
- It shows the likelihood of each discrete random variable value occurring in experiments.
- The discrete random variable's probability is the probability distribution.
- The value of each outcome can easily be found in advance if you know probability distribution
PMF vs CDF
- Probability mass function/PMF covers the probability that a random variable will take a value equal to x: formula F(X = 𝑥𝑖 ) = P{X =𝑥𝑖 }
- Cumulative distribution function/CDF is "the probability that random variable values less than or equal to x": formula F(X ≤ 𝑥𝑖 ) = P{X ≤ 𝑥𝑖 }
Independent Random Variables
- For random variables X and Y to be independent, for any sets of real numbers A and B: P{X ∈ A,Y ∈ B} = P{X ∈ A}P{Y ∈ B} or P(XꓵY) = P(X).P(Y)
- For independent functions X, Y and Z: P(XꓵYꓵZ) = P(X).P(Y).P(Z)
- Independence and mutually exclusive events don't overlap
- Mutually exclusive events don't happen simultaneously
- Independent events can happen at the same time but they don’t imply each other
Exam Question
- If A and B are independent events, their complements are independent as well, P(𝐴𝑐 ꓵ 𝐵𝑐 ) = P(𝐴𝑐 ).P(𝐵𝑐 )
- If A = "Ahmed passes the exam" and B = "Mahmoud passes the exam", and A and B are independent:
- P(A) = 3/4
- P(B) = 2/3
- Probability of Both passing = P(AꓵB) = P(A).P(B) = (3/4) * (2/3) = 1/2
- Probability of At least one passing P(AꓴB) = P(A)+P(B)-P(AꓵB) = 3/4 + 2/3 − 1/2 = 11/12
- Probability of Neither Passing =P(𝐴𝑐 ꓵ 𝐵𝑐 ) = P(𝐴𝑐 ).P(𝐵𝑐 ) =(1-P(A)).(1-P(B))= 1/4 * 1/3 = 1/12
- Probability of Only Ahmed passing = P(Aꓵ𝐵𝑐 ) = P(A).P(𝐵𝑐 ) = P(A).(1-P(B)) = (3/4) * (1/3) = 1/4
- Probability of Only Mahmoud passing = P(𝐴𝑐 ꓵB) = P(𝐴𝑐 ).P(B) = (1-P(A)).P(B) = (1/4) * (2/3) = 1/6
Machine Example
- In a factory: Machine A produces 100 items daily, 9% defective; Machine B produces 150, 18/150 is defective
- A defective item produced by A is 0.09.
- A non-defective item produced by A has P(NA) = 0.91.
- Probability of a defective item from B: P(DB) = 18/150 = 0.12.
- Probability of a non-defective item produced by B: P(NB) = 0.88
- Probability that both items are non-defective P(NANNB) = P(NA).P(NB) = 0.91*0.88 = 0.8
- Probability that one item is defective E = (NANDB)U(NBNDA) → P(E) = P(N₁NDB) + P(NBNDA) = P(NA).P(DB) + P(NB).P(DA) = 0.1893
- A random variable and difference between discrete and continuous r.v. can be measured to understand meaning
- Probability can be based on one or more outcomes of a sample space.
- Probability of values taken by two independent r.v.s can be easily measured
Joint Probability
- Joint probability assesses the likelihood if random variables are not independent
- A joint probability distribution represents a probability distribution for two or more random variables with the formula f(x,y) = P(X = x, Y = y)
- Joint probability distribution looks for a relationship between two variables
Example
- Find the probability number 3 will occur twice - Given two dice are rolled at the same time.
- Given below: f(x,y) = P(X = x, Y = y), the conditions are being labelled instead of X and Y
- A table of probabilities for happening happening at the same time
- A joint probability table can be used to find probability of event occurring.
- To find the probability of X = 3 and Y = 3 is ⅙.
- In experiments for possible causes of cancer, the number of cigarettes smoked is measured in one variable whilst the patient's age is measured in second
- If , the F(X =𝑥𝑖 ,Y =𝑦𝑗 ), the joint probability mass function (PMF), we can compute that of values of X and Y, F(X =𝑥𝑖 ,Y =𝑦𝑗 ) = P{X =𝑥𝑖 ,Y =𝑦𝑗 }
- If F(x, y), with joint cumulative probability distribution function, we can compute that of all values of X and Y, F(x, y) = P{X ≤ 𝑥𝑖 ,Y ≤ 𝑦𝑗 }
- An individual PMF of X and Y can be calculated
- P{X = 𝑥𝑖 } =σ𝑗 𝑃{X =𝑥𝑖 ,Y =𝑦𝑗 } Marginal probability
- Joint probability mass function computes PMF but the reverse is not true
- The sum of marginal probabilities of one variable is 1
Example
- If the random variables = X and Y
- P(X=xi, Y=yj) = P(X= xi) = P(Y = yj) should be tested to see if X and Y are independent Example P(X=2,Y=2) = 1/6 not equal P(X=2)P(Y=2) = 3/16 means not independent
Mean, Variance, Std
- Distribution for Independent r.v. has probability P(X=xi)
- A single value of X or the most common value it takes is not reflected in distribution data
- The data does not consider P(X=xi).
- It gets the marginal probability over all values of Y, for joint variables
- The expected value / mean is denoted by E(x) or µ
- E(x) is calculated using a weighted average.
- Weight determines how likely an event is, is dependent on infinite numbers
- The weights are proportional to probability distribution for r.v
Expectation
- If, the F(X =𝑥𝑖 ), is in the PMF, is based on random variable probability
- 𝐸 𝑋 = σ𝑖 𝑥𝑖 . {P(X=𝑥𝑖 )} where i determines probability
Tossing a Die
- A fair die results in no. of pounds
- For a non-prime die, a no. of pounds is lost
- Values are:
- Xi: 2, 3, 5, -1, -4, -6
- P(X = xi): 1/6, 1/6, 1/6, 1/6, 1/6, 1/6
- E[X] = (21/6)+ (31/6)+ (51/6)+ (-11/6)+ (-41/6)+ (-61/6) = -(1/6)
- An unfavourable gain is realised to player since E[X] is negative
Distribution
- E(C) = c where c is a constant
-
- E(𝑋 2) = σ𝑖 𝑥𝑖 {P(X=𝑥𝑖 )} and similarly E(𝑋 𝑛 ) = σ𝑖 𝑥𝑖{P(X=𝑥𝑖 )}, also E(W(X)) = σ𝑖 𝑊(𝑥𝑖 ). {P(X=𝑥𝑖 )} for W(X)
- For a real number, 𝛼:
- E(𝛼 X) = 𝛼 E(X)
- E(X±𝛼) = E(X)±𝛼
- Expectation can also be expressed between two samples.
- E(X±Y) = E(X) ±E(Y)
- E(𝛼𝑋 + 𝛽𝑌 + 𝛾) = 𝛼E(X)+𝛽E(Y)+𝛾 , where 𝛼, 𝛽, 𝛾are constants
Variability
- Variance highlights how far data varies around weighted average
- For discrete, the weights are determined by height
- Variance: Var(X) = E[(X −μ)2] Var(X) = E[𝑋2 ] − (𝐸[𝑋])2
- This result can be expressed using properites of E(X)
- standard division can be expressed through sigma (𝜎)
- standard division = 𝜎= √𝑉𝐴𝑅(𝑋), 𝜎 > 0
- a high range
- low ranges mean a close data sector
- Importance: variance allows different values to be estimated
Formulae
- For real number alpha
- Var(𝛼 X) = 𝛼 2 Var(X)
- Var(X±𝛼) = Var(X)
- 𝜎𝑋±𝛼 = 𝜎𝑋
- 𝜎𝛼𝑋 = 𝛼 𝜎𝑋.
- Formula properties must still be calculated on top of any variance calculation
Covariance and Correlation
- Interchangable, however they may come from seperate origins
- used for dependency and linear relations of two variables
- Covariance: how variants impact each other
- Correlation: how a change impacts another variable.
- Covariance determines the degree of which two variables change together.
- Covariance expresses: The tendency to increase and decrease. If the variable has a negative value then it suggest correlation
- Magnitude depends on variable to interpret covariance,
- Correlation measures strength and direction of relationships amongst the variables
- 1 to -1 value is expressed by correlation
- It scales and identifies relationship.
- Covariance reveals direction, formula Cov(X, Y) = E[(X −𝜇𝑥 )(Y − 𝜇𝑦 )] or Cov(X, Y) = E[XY] − E[X]E[Y]
- Changing units scale affects dependent variables
- Limited data results of differing scales in different data sets
- Standard division assesses the r/variable linear relationship.
- The product of the deviations is equal to standard deviation
Properties
- Cov(X, Y ) shows how much each relates to the other variable
- Postive = indicates Y to the first, i.e. the average in x
- Zero = indepedence means that the variance is = 0, in a non linear relation (X,Y)
- Cov(X, Y) = Cov(Y,X)
- Cov(X,X) = Var(X)
- Cov(aX, bY) = ab Cov(X, Y) - for any constants
- Cov(X1 +X2,Y) = Cov(X1,Y) + Cov(X2,Y)
- Var(X+Y) = Var(X)+ Var(Y ) +2Cov(X, Y)
- If one has a relationship based on -1 or another 1 the core indicates how well one impacts the next through both variables
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.