Dependable Computing Modelling and Simulation PDF

Dependable Computing Modelling and Simulation - DCMS Domenico Sosta, Valerio Triolo November 2024 Indice 1 Probability Theory 5 2 Random Var...

Dependable Computing Modelling and Simulation - DCMS Domenico Sosta, Valerio Triolo November 2024 Indice 1 Probability Theory 5 2 Random Variable 6 2.1 Random experiment............................. 6 2.1.1 Sample space............................. 6 2.1.2 Events................................. 6 2.1.3 Probability.............................. 6 2.1.4 Laplace definition........................... 7 2.1.5 Relative frequency.......................... 7 2.1.6 Conditional probability........................ 8 2.1.7 Independent events.......................... 8 2.1.8 Total probability theorem...................... 8 2.2 Random Variables............................... 8 2.3 Probability Mass Function.......................... 9 2.4 Events probability.............................. 10 2.5 Cumulative Distribution Function CDF................... 10 2.6 Continuos Random Variables......................... 11 2.7 Probability Distribution Function PDF................... 11 2.8 Expected value................................ 12 2.9 Joint distribution............................... 12 2.10 Maximum of two r.v............................. 12 2.11 Minimum of two r.v.............................. 13 2.12 Poisson processes and event generation................... 13 2.12.1 Poisson process............................ 13 2.12.2 Observer................................ 14 2.13 The exponential distribution......................... 14 2.13.1 Memoryless property......................... 14 2.13.2 Expected value of exp distribution................. 15 3 Simulation 16 3.1 Simulator................................... 16 3.1.1 Phases................................. 16 3.1.2 Problem formulation......................... 17 3.1.3 Model design and data collection.................. 18 3.1.4 Model formulation.......................... 18 3.1.5 Model translation........................... 18 3.1.6 Model verification........................... 18 3.1.7 Experiment planning......................... 18 2 3.1.8 Result Analysis............................ 18 3.2 Definition of a simulator........................... 18 3.3 Discrete event simulation........................... 19 3.3.1 Events................................. 19 3.3.2 Notation................................ 20 3.3.3 Initialization.............................. 20 3.3.4 Future event list........................... 20 3.3.5 Simulation running.......................... 20 3.4 An example: a queueing system....................... 20 4 Random Number Generation 21 4.1 Linear congruential method......................... 21 4.2 Cumulative random variable......................... 22 4.3 Algorithm generation for continuos r.v................... 22 4.4 Algorithm generation for discrete r.v.................... 22 4.5 Example.................................... 23 4.6 Gaussian distribution............................. 23 5 Output Analysis 24 5.1 Measure of interests.............................. 24 5.2 Point estimator................................ 24 5.3 Estimation of a measure........................... 25 5.4 Confidence interval.............................. 25 5.5 Quantile.................................... 25 5.6 Evaluating I.................................. 26 6 Reparable Systems and Unrepairable Systems 28 6.1 Reliability................................... 28 6.2 Failure times................................. 29 6.3 Failure rate.................................. 30 6.4 Quantities of exponential distribution.................... 30 6.5 Weibull distribution.............................. 31 6.6 Unrepairable systems............................. 32 6.6.1 Series................................. 32 6.6.2 Reliability improvement....................... 33 6.6.3 Redundant systems.......................... 33 6.6.4 Unreliability.............................. 33 7 Markov Chains 34 7.1 Markov Chain................................. 34 7.1.1 Equations............................... 34 7.1.2 Chapman Kolmogorov equations................... 35 7.2 Dicrete Time Markov Chain (DTMC).................... 35 7.3 Continuos Time Markov Chain....................... 36 7.3.1 Q matrix................................ 36 7.4 State classifications.............................. 37 7.5 Markov chains characterizations....................... 38 7.6 Excercises................................... 38 7.6.1 The Land of OZ............................ 38 3 7.6.2 Two processors system with shared memory............ 40 7.6.3 Two processors and three memory modules............. 42 8 Queues 43 9 Case of study: M/M/1 queue 44 4 Capitolo 1 Probability Theory 5 Capitolo 2 Random Variable 2.1 Random experiment A random experiment is defined as an experiment whose outcome is uncertain. An example could be the number of requests to a Web Server in a fixed time interval (0, t]. The number of arrivals is assumed to have a specific distribution (Poisson). The outcomes are possible results an experiment can produce. They can be: Discrete Continuos Mixed 2.1.1 Sample space The sample space S is the set of all the possible outcomes of a random experiment and can be either finite or infinite. 2.1.2 Events An event is a statement of conditions that defines a subset of S. The events can be: 1. universal, if coincide with the sample space S 2. impossible, if it’s the null set 3. elementary, defined as {s}, s ∈ S 2.1.3 Probability We assign probabilities to events in the sample space. Indeed, probability represents the “tendency” of an event to be the outcome. The probability of an event A ∈ S is denoted with P (A). We can define probabilities in two possible ways: Laplace definition Relative frequency 6 Some properties are the Kolmogorov’s axioms: ∀A ∈ S, P (A) ≥ 0. Meaning that for every event we define in the sample space S the probability associated to that event is non-negative. P (S) = 1. The probability of the entire sample space is 1. ∀A, B ∈ S : A ∩ B = ∅ =⇒ P (A ∪ B) = P (A) + P (B). If the events are disjoint their joint probability is just the sum of the two probabilities. 2.1.4 Laplace definition Let S = {E1 , E2 , E3 ,...En } P (E1 ) = P (E2 ) = P (En ), meaning that events are equi-probable A = E1 ∪ E2 ∪ EnA then NA P (A) = N The probability is evaluated as the ratio of favorable events over possible events 2.1.5 Relative frequency There is another interpetation of the probability. In fact, if we have N trials of the same random experiment, whose NA is the number of trials with A as outcome, if we start increasing the number of trials, the probability will start converge to the frequency of how much time NA will appear in N. Formally: NA P (A) = lim N →∞ N An example could be rolling a dice and A = {2} nA N 1 1 6 N 1 2 3 4 5 6 7 8 9 10 7 2.1.6 Conditional probability If we know one outcome of the experiment is B ∈ S and P (B) ̸= 0 if we consider a sample point (an elementary event), it must be that: ( P (s) P (B) if s ∈ B P (s | B) = 0 if s ∈ /B Conditional probability is the probability of A given that we know that B occurs: P (A|B) X X X P (A|B) = P (s|B) = P (s|B) + P (s|B) s∈A s∈A∩B s∈A∩B but since s ∈ / B the probability is 0, so X P (A|B) = P (s|B) s∈A∩B and using the definition 2.1.6, X P (s) P (A ∩ B) P (A|B) = = , P (B) ̸= 0 s∈A∩B P (B) P (B) The conditioned probability is P (A, B) P (B) 2.1.7 Independent events Two events are said independents when P (A|B) = P (A), meaning that B does not affect A The joint probability is computed as P (A, B) = P (A)P (B) 2.1.8 Total probability theorem The total probability theorem is one of the most important theorem in probability theory. It expresses the total probability of an outcome which can be realized via several distinct events, hence the name. Considering we partition the entire sample space S in multiple partitions as refered to figure Bi ∩ Bk ̸= ∅ Bi ∩ B2 ∩...BN = S 2.2 Random Variables Random variables create associations between objects of the sample space S and numbers (because sample space is made of different kind of objects and not necessary numbers, so we need to quantify outcomes). 8 Figura 2.1: Caption A random variable (r.v) X on a sample space S is a function that assigns a real number X(s) to each sample point s in the sample space: X : S =⇒ IR (2.1) A r.v can be seen as a mathematical function that partitions S into mutually exclusive sets of events. E.g: tossing a coin S = head, tail X : S = 0, 1 In this example we associate every outcome of the sample space to a real number. For example head to 0 and tail to 1. The r.v can be discrete or continuos 2.3 Probability Mass Function An event in terms of r.v is defined as: Ax = {s ∈ S|X(s) = x} and its probability is defined as X P (Ax ) = P ([X = x]) = P (s) X(s)=x we define, we probability mass function as X pX (x) = P (X = x) = P (s) X(s)=x some obvious properties... 0 ≤ PX (x) ≤ 1 P i PX (x) = 1 9 2.4 Events probability Talking about r.v and events, we are interested in computing probabilities of the possible subsets of a generic set A And since every outcome can be associated to a random variable, we will define a specific event as [X ∈ A] and its probability as P (X ∈ A) 2.5 Cumulative Distribution Function CDF The cumulative distribution function (CDF) is defined as X FX (t) = P (X ≤ t) = pX (t) x≤t It can be interpeted as the “cumulative probability” that a random variable assume a value less than t. Some important properties of a CDF are: limt→−∞ Fx (t) = 0 limt→+∞ Fx (t) = 1 Fx (t) must be a monotonic and non decreasing function (it can decrease or at least be 0) 0 ≤ Fx (t) ≤ 1 P (aX ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a). Important: the CDF between two points cannot be computed as the area above the function Fx (t), but it must be calculated as the difference between the CDF evaluated at the extrems of the interval. So having these informations the CDF can be depicted, for example: 1 0.75 CDF F (x) 0.5 0.25 0 0 1 2 3 4 5 x 10 2.6 Continuos Random Variables A continuous random variable can be defined as a random variable that can take on an infinite number of possible values because maybe the sample space is not enumerable. A continuos r.v is just a function that assigns a real number to each sample point s in S such that ∀x ∈ IR, {s|X(s) ≤ x} and that characterize an event So the definition of CDF FX (t) can be extended to the entire real line. Indeed, we can define the CDF as FX (t) = P [x ≤ X] Typically, in computer system analysis the random variable is the time that can be treated as either continuos or discrete, that’s why usually we depict the CDF from 0 to −∞ altough it can be represented from −∞ to +∞ 2.7 Probability Distribution Function PDF From the knowledge of CDF, we can derive the probability distribution function, a function that represents probabilities of occurrence of possible outcomes for a random experiment. The PDF is the derivative of CDF in all domain dFx (t) fx (t) = dt So we can go back to CDF by integrating the probability distribution function Z t F (t) = f (x)dx 0 Of course, the area under the curve must be 1 because the infinitesimal sum of all probabilities must be 1 (for the definition of probability Z ∞ f (x)dx = 1 0 So we can use the definition of CDF for saying that: Z t P (X ≤ t) = f (x)dx 0 Z b P (a ≤ x ≤ b) = f (x)dx = F (a) − F (b) a We are saying that the probability of a specific event can be computed by integrating the PDF (so it can be computed calculating the area under the curve), but if we have only the CDF the probability must be computed by computing the difference between the CDF evaluated at the two extremes. 11 2.8 Expected value The expected value of a r.v, informally, is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. It’s a sort of “center of gravity“ around which all the possible values are spread. For a r.v we can define differents “moments“. The first moment, of course, it’s the expected value: Z ∞ E[X] = tf (t)dt 0 If the random variable is discrete, the expected value is: X E[X] = xi p i i 2.9 Joint distribution Let’s consider we have two random variables X1 and X2 defined over [0, ∞). Their joint distribution is the corresponding probability distribution on all possible pairs of outputs. It can be defined like: FX1 ,X2 (x1 , x2 ) = P [(X1 ≤ x1 ]and(X2 ≤ x2 )] That joint distribution, if the random variables are independent, is: FX1 ,X2 (x1 , x2 ) = FX1 (x1 ) · FX2 (x2 ) 2.10 Maximum of two r.v If X1 and X2 are, for example, two failure times of two components, the max(X1 , X2 ) is when the two components are both failed, so we want to know for example its distribution Fmax. Supposing X1 and X2 are independents in the range [0, +∞), Ymax = max(X1 , X2 ) and the Fmax can be defined as: Fmax = P (Ymax ≤ t) = P {X1 ≤ t ∧ X2 ≤ t} = FX1 ,X2 (t, t) = F1 (t) · F2 (t) For example, if two threads run concurrently and their completion time is a random variable distributed with CDF F1 (t) and F2 (t) what is the completion time of the all process (so the maximum of the completetion time)? Suppose F1 (t) = 1 − e−4t F2 (t) = 1 − e−5t Fmax (t) = F1 (t) · F2 (t) = (1 − e−4t ) · (1 − e−5t ) = 1 − e−4t − e−5t + e−9t So we proved that the maximum of two independent, exponentially distributed r.v is not exponentially distributed 12 2.11 Minimum of two r.v The same discussion can be done for the minimum of two r.v. This concept will be useful in studying stohastic processes. Fmin = P (Ymin ≤ t) = P {X1 ≤ t ∨ X2 ≤ t} = 1 − P {X1 > t ∨ X2 > t} = 1 − FX1 ,X2 (t, t) = 1 − (1 − F1 (t))(1 − F2 (t)) e.g: F1 (t) = 1 − e−4t F2 (t) = 1 − e−5t Fmin (t) = (1 − F1 (t))(1 − F2 (t)) = 1 − e−9t The minimum of two exp distributed independent r.v is an exponentially distri- buted r.v 2.12 Poisson processes and event generation The poisson processes is a useful mathematical tool for generating random events in specific conditions: We can generate events in a infinitesimal time interval dt at most. The probabily to generate an event in a interval dt is assumed costant and it’s equal to λdt The events generated are independent one from others When these conditions holds, we are characterizing a stationary poisson process 2.12.1 Poisson process The probability to generate n events in dt (approximated to ∆t) is: (λ∆t)n −λ∆t P [X(∆t) = n] = pn (∆t) = e n! The poisson process is also said a counting process, because represents the total number of occurrences or events that have happened up to and including time t. This also can be translated in the fact that the time difference between events of the counting process are exponential variables with mean λ1 13 2.12.2 Observer From the point of view of the observer (the one who looks at the event generated), he’s interested in the time elapsed between two emissions (interarrival time) (that of course is a random variable). The probability to have no generations in [0, t[ is e−λt The probability to have one emission in (t, t + dt[ (as we defined before) is: λdt And the probability distribution function (PDF) in fact can be found as λe−λt (2.2) 2.13 The exponential distribution The exponential distribution is the probability distribution function of the time distance between two events in a poisson process. Fx (t) = 1 − e−λt In fact, if we can derive to obtain the pdf dFx (t) = λe−λt dt That is the same we found in 2.2 The exponential distribution is widely used in system modeling, but it’s considered the best distribution for computer system analysis because it’s the only distribution that got the memoryless property 2.13.1 Memoryless property Given the random variable X, representing a time quantity (for example the component life time). The random variable Y = X − t is the remaining life time in t Considering the distribution of the random variable Y Ft (y) = P (Y ≤ y|X > t) If we substitute Y because Y = X − t Ft (y) = P (Y ≤ y|X > t) = P (X − t ≤ y|X > t) = P (X ≤ y + t|X > t) That is a sort of “conditioned probability”. So using the formula of conditioned probability we can say that: P (X ≤ y + t, X > t) P (X ≤ y + t|X > t) = = P (X > t) 14 Since these are CDF, we can use the integral of PDF λe−λt R y+t f (t)dt R t+∞ = t f (t)dt R y+t λe−λt dt R t+∞ = t λe−λt dt −λt e (1 − e−λt ) = e−λt Ft (y) = 1 − e−λy but we are saying that the remaining life does not depend on the life already spent 2.13.2 Expected value of exp distribution If we have Fx (t) = 1 − e−λt The expected value is: Z +∞ Z +∞ E[X] = t · fX (t)dt = λ t · e−λt dt = 0 0 (skipping the proof, it’s) 1 E[X] = λ 15 Capitolo 3 Simulation 3.1 Simulator A simulator is a software program that emulates how a system works The system behaviour can be represented by its state When using a simulator, many things can be done, for example: System component interactions, for example we can see how the system reliability changes if we combine multiple components Changing many parameters (like λ or µ) we can see how the system reacts. We can evaluate the performance when the system works in different operational modes (like operational or failed) We try to validate the analytical model we found (for example a Markov chain) with the simulation 3.1.1 Phases The typical phases for creating a discret event simulator are the following as shown in diagram 3.1.1: 1. Problem formulation 2. Data collection 3. Model formulation 4. Model evaluation 5. Model translation 6. Model verification 7. Experiments planning 8. Analysis 16 Problem formulation Model design and Data collection Model formulation Model evaluation Model translation Model verification Experiment planning Analysis 3.1.2 Problem formulation The problem formulation is a phase in which we describe (using words) the problem and its goals. For example our goals are: Improve performances, like reliability Identify input/output relations Optimize our model Compare two or more models In the problem formulation we identify the measure of interest, for example if we are interested in calculating throughputs, reliability, etc.. and the input parameters, namely the input that our model will get 17 3.1.3 Model design and data collection In this phase we identify what we want to represent, typically the states and what are the inputs, for example the service rates or the inter-arrival times We could use: Experimental data, so for example data we collected in the past for a long time We can do realistic hypothesis, so the approximation but we have to be careful otherwise we will produce a wrong model 3.1.4 Model formulation In this phase we simplify the model by extracting only the informations we need. For example, we can refine our model describtion by abstracting the main characteristics). 3.1.5 Model translation In this phase we translate the model into programming language, for example: General purpose languages (Python, MATLAB, SIMSCRIPT, MODSCRIPT) Simulation oriented: There are several languages that allows you to accellerate simu- lation writing by exploiting simulation objects 3.1.6 Model verification We should verify the correctness of the model 3.1.7 Experiment planning Of course in the simulation many “hyper-parameters” must be specified. For example in this phase we define: Initialization period (Typically when the overall simulation will last) The simulation time (The simulation time of a single run) The number of simulations 3.1.8 Result Analysis In the last phase, results are collected but we must verify the goodness of results. This can be done for example by making more simulations and calculate some measures, like the mean metrics and confidence interval 3.2 Definition of a simulator Some definitions of a simulator: System: set of entities interacting to achieve a goal 18 Model: the abstract representation of logical system connections. It describes the real system with state, activities and attributes. State: The set of variables that identify the system at any time istant Attributes: characteristic of an entity Events: facts that are able to change the system state Event calendar: also said future event list, it’s the list of happening events, ordered over the time Activity: a defined time period Delay: an undefined time period (r.v) Clock: a variable accounting for the simulation time 3.3 Discrete event simulation We can build a simulation identifying a sequence of system snapshots over the time that will bring the system to a final state. A snapshot is constituted by: System state Active activities (a list) Entity states Cumulative counters and statistiscs 3.3.1 Events Events are things that can change the system state, either in a continuos or a discrete way. We will focus on events that change the system state in a discrete way. Events can be of different kind of types. Given an event, we have to understand how it influences the system state and attri- butes of entities. Based on the attributes values at a given simulated time, future events list are identified (events are added in the list). When an event happens: 1. The clock advances (in a discrete way) 2. Future events list is modified. At least one event is removed but depending on the system state maybe other events are impossible to happen. 3. The system state updates 19 3.3.2 Notation τ is the simulation time xi is the system state at the i-th step e is a generic event δ(xi , e) is a transition function state. It’s a function that given a state and a event returns the next system state. Typically it’s a bunch of if-else statements. 3.3.3 Initialization First of all, future event calendar is populated with the happening events at time 0. The implementation could be a linked list ordered accordng to the scheduling time (the time the event will occur) 3.3.4 Future event list The future event list or calendar is a mportant data structure because it stores the possible events at a given step. It must be very efficient since read and write operations are frequent. It can be implemented as a queue but the access at the head of the list must be granted. 3.3.5 Simulation running Suppose the time istant a generic event will occur is Tk when the system is in state xk 1. First, get the couple (ek , tk ) in the future event list. This is the first event that was schedulated. 2. τ = τ + Tk. Update the clock (time advances in a discrete way) 3. xk+1 = δ(xk , ek ). Based on the current state and event we will get the next state of the system. Impossible events are removed in the calendar of xk+1 and new possible events are added in the list. 4. Sort the list 3.4 An example: a queueing system 20 Capitolo 4 Random Number Generation In our simulation, we want to schedule events in a random way. Programming language should provide some tools for generating “random” numbers. It’s impossible for a machine to generate a random number, but algorithms are made for generating pseudo-random numbers uniformly distributed in [0, 1] We can generate a random number in [0, 1] based on two principles: The uniformity over [0, 1]. So if we split the interval into N sub-intervals, we expect x N per sub-interval Each generation must be independent from the previous one. We can approximate long sequences of random numbers to allude at “pseudo-randomness” but these characteristics must be respected: The computational time must be reasonable. So the algorithm shouldn’t be complex in terms of speed. The algorithm should make long cycles in the sense that at a certain point the sequence of generated numbers will repeat and so we want to keep these cycles as long as possible. The algorithm must be reproducible, but this is a characteristic of every algorithm. The random properties should be as close as possible to the ideal sequences 4.1 Linear congruential method This is an algorithm to generate random a sequence of random numbers X1 , X2 ,... in [0, m − 1] Xi+1 = (aXi + c) mod m, i ∈ N (4.1) X0 is the seed a is the multiplier c is the increment m is the module 21 e.g: X0 = 27, a = 17, c = 43, m = 100 X0 = 27 applying the formula 4.1 X1 = (17 · 27 + 43) mod 100 = 2 X2 = (17 · 2 + 43) mod 100 = 77 X3 = (17 · 77 + 43) mod 100 = 52 as you can see the numbers are generated uniformly between [0, 99] because m = 100. If you want numbers in [0, 1]: Xi Ri = m Usually, m = 2 − 1 or m = 2. In the C standard library for example, m = 215 32 48 4.2 Cumulative random variable There is a theorem that says if X is a r.v with continuos density function f (x) the cumulative distribution: Z X C(X) = f (u)du −∞ is uniformely distributed in [0, 1] When we know the values of the integral, C(X) can be written as: c = F (X) If F (X) is invertible, X = F −1 (C) has f (c) as pdf Samples generated by F −1 (C) where C is a r.v uniform distributed, have the same distribution of X. 4.3 Algorithm generation for continuos r.v 1. Given a specific CDF F (X), set F (X) = R 2. In the equation above, we know R so we have to solve for X by inverting the F (X), X = F −1 (R) 3. We generate uniformly distributed numbers in [0, 1], R1 , R2 , R3 4. We use the Ri we have to compute the Xi = F −1 (Ri ) 4.4 Algorithm generation for discrete r.v Assume C is a uniform r.v over [0, 1]: c2 − c1 P {c1 ≤ C ≤ c2 } = 1−0 Then, values of X can be generated according to this rule X = xk ⇐= =⇒ F (Xk−1 ) < c ≤ F (xk ), k = 1, F (x0 ) = 1 22 4.5 Example Let’s suppose we want to generate numbers according to a negative exponential distribu- tion. F (x) = 1 − e−λx F (X) = R R = 1 − e−λx We invert x: ln(R) = ln(−e−λx ) −1 x= ln(1 − R) λ −1 xi = ln(1 − Ri ) λ Usually xi = −1 λ ln(Ri ) is used since Ri ∈ [0, 1] also (1 − Ri ) ∈ [0, 1] 4.6 Gaussian distribution Sometimes, it’s not possible to invert all functions F (X) like in the case of Gaus- sian distribution. The central limit theorem states that the sum of infinite uniformly distributed r.v is a Gaussian distribution Y = x1 + x2 + x3 +....xn Usually the sum of n ≥ 12 uniformly distributed (but actually they can be general distributed) is a good approximation of a Gaussian distribution 23 Capitolo 5 Output Analysis We can discuss the output of two different kinds of simulation: Terminating simulation: a simulation that runs for a duration of time TE where E is a specific event. It’s also called “transient simulation” Steady-state simulation: The system runs for a long period (theoretically infinite) and the state is not influenced by initial conditions. 5.1 Measure of interests Often the simulation output is a sequence of discrete time data {Y1 , Y2 ,...Yn } or continuos time data in the form {Y (t), 0 ≤ t ≤ TE } They are collected to estimate a certain measure θ 5.2 Point estimator In case of discrete time data, the point estimator for θ is n 1X θ̂ = Yi n i=1 The estimator is error free (unbiased) if E[θ̂] = θ if the expected value of the point estimator is equal to the real value but usually this is not true In the case of continuos time data, the point estimator is Z TE 1 ϕ= Y (t)dt TE 0 is a time average. Also in this case the estimator is error-free if E(ϕ̂) = ϕ 24 5.3 Estimation of a measure When the estimator is unbiased, we consider the following random variable: θ̂ − θ (5.1) σ(θ̂) it’s still a random variable and it’s distributed according to a Student-t distribution with n degrees of freedom. The degrees of freedom are the number of collected data. We use the sample variance with n − 1 degrees of freedom instead of the standard deviation s σ̂ = √ n n 2 1 X s = (θi − θ̂)2 n − 1 i=1 The Student-t distributiom is a continuous probability distribution that generalizes the standard normal distribution. It tends to a Gaussian distribution when n increases. Distribuzione t di Student 0.4 Density ν=1 ν=3 ν=5 0.3 0.2 0.1 x −4−3−2−1 1 2 3 4 5.4 Confidence interval For estimating the goodness of the measure we have to know the interval such that P [θ ∈ I] = α α is the confidence level I is the confidence interval 5.5 Quantile The quantile or α-quantile is the value qα that divides the population in two parts, one proportional to α and the other proportional to 1 − α 25 5.6 Evaluating I We defined the confidence level α such that: −t̄α ≤ t ≤ t̄α and P [−t̄α ≤ t ≤ t̄α ] = α If we substitute t = θ̂−θ σ(θ̂) we obtain that the interval is P [θ̂ − t̄α σ ≤ θ ≤ θ̂ + t̄α σ] so the Confidence Interval CI is: I = x ∈ IR|θ̂ − t̄α σ ≤ x ≤ θ̂ + t̄α σ The confidence interval CI with percentage probability 100(1 − α)% is θ̂ − t α2 ,n σ ≤ θ ≤ θ̂ + t α2 ,n σ where t α2 ,n is the 1 − α2 percentile of Student-t distribution with n degrees of freedom Since the Student-t distribution is not defined in a closed form, we will use a tabular approach. When R independent values are known, the estimation can be computed as their mean R X Ȳ = Ŷi i=1 but Ŷ has an error respect to the real value Y , but if each Yi is independent and equally distributed the error is almost zero (the estimator is unbiased, meaning that E[Ŷ ] = Y. The measure of the error is the confidence interval. Since the estimator is unbiased, we can use the sample variance over R measures, meaning that: R 2 1 X S = (Yi − Ŷ )2 R − 1 i=1 since we said that the interval is Ȳ − t α2 ,n σ ≤ Y ≤ Ȳ + t α2 ,n σ Ȳ ± t α2 ,R−1 σ but σ is S σ=√ R so, if we substitute S 2 v u R u 1 X σ=t (Yi − Ȳ )2 R(R − 1) i=1 so, finally: v u R u 1 X Ȳ ± t α2 ,R−1 t (Yi − Ȳ )2 R(R − 1) i=1 26 t α2 ,R−1 is the quantile of the Student-t distribution with R − 1 degrees of freedom such that the area cut off by each tails is equal to α2 The error is limited by the confidence interval with 100(1 − α)% probability We see that, even increasing R → ∞, the error will not be 0 in any case. 27 Capitolo 6 Reparable Systems and Unrepairable Systems 6.1 Reliability The reliability of a system is defined as the “probability a system is continuosly working over a specific time interval”. Since we work with random systems, the time is typically a random variable, meaning that we will use a probabilistic approach to estimate the system reliability. More generally, Reliability 1 Reliability of a component is its ability to perform the work which it has been designed for, over the time interval (0, t) given the behavioral conditions and the stress subject to where t is said mission time. Now we have to define the reliability as a mathematical function. Let assume that X is the random variable that reprents the system life time (or the time to failure). We know that the CDF of X is Fx (t) = P [X ≤ t] meaning that is the probability the system will be failed until time t So the reliability (the probability the system will not be failed until time t) is: R(t) = 1 − Fx (t) Some properties.. R(t) is not a CDF (because it’s not a non decreasing function but better to say a strictly decreasing function, it will decrease or at least be 0 -> non increasing function) limt→∞ R(t) = 0 0 ≤ R(t) ≤ 1 in [0, +∞) 28 1 0.75 Value 0.5 0.25 CDF F (x) Reliability R(x) 0 0 1 2 3 4 5 x Let fX (t) be the PDF of X The reliability function can be derived from the PDF: Z +∞ R(t) = fx (τ )dτ t Why? Because the R(t) = 1 − F (t), and since Z t F (t) = fx (τ )dτ 0 the remaining part is from t to +∞, thus fx (t) = − dR(t) dt f (x) R(t) x t 6.2 Failure times The MTTF (Mean Time to failure) is: Z ∞ E[τ ] = M T T F = t f (t) dt 0 and integrating by parts we find that: Z ∞ MT T F = R(t) dt 0 29 6.3 Failure rate Let’s suppose that we have a unconditioned probability that a fault occurs in [t, t + ∆t], equal to f (t)∆t. This is a not-conditioned probability. But what happens if we want to evaluate the probability that the system correctly operated till the mission time. In this case, the failure probability ∆t is different. What we mean with failure probability? In this case, we have a conditioned probability, because we want to evaluate the probability that a fault occurs in [t, t + ∆t] given that the system will operate correctly until time t. P [t < X < t + ∆t] F (t + ∆t) − F (t) P [t < X < t + ∆t|X > t] = = P [X > t] R(t) this seems a well known quantity, the difference quotient for computing derivative, if we add ∆t. In fact, the istantaneuous failure rate (hazard rate) is: F (t + ∆t) − F (t) 1 h(t) = lim · ∆t→0 R(t) ∆t F (t + ∆t) − F (t) 1 h(t) = lim · = ∆t→0 ∆t R(t) f (t) R(t) but since fx (t) = − dR(t) dt 1 dR(t) h(t) = − · R(t) dt This is a differential equation of the first order, costant coefficients, the solution is: Rt R(t) = e− 0 h(x)dx So intuetivilty, we can understand that, the more is the hazard rate, the less will be the reliability function. 6.4 Quantities of exponential distribution F (t) = 1 − e−λt f (t) = λe−λt R(t) = e−λt h(t) = λ The MTTF for a exponential r.v is: Z +∞ Z +∞ MT T F = R(t) dt = e−λt 0 0 this is a simple integral and the solution is: 1 MT T F = λ λ is dimentionally equal to t−1 (it’s a “frequency”) λ · t is a pure number (because t ∗ t So the parameter of the R(t) is λ but the value of the exponential function has no −1 unit measure. (it’s a pure number) 30 6.5 Weibull distribution Another important distribution often used is the Weibull distribution. The Weibull distribution is just a generalization of the exponential distribution. The CDF of a Weibull distribution is t β F (t) = 1 − e( α ) where α is the scale factor β is the form factor F (t) 1 0.8 0.6 0.4 β = 0.3 0.2 0.3 < β < 1 β=1 t 0.5 1 1.5 2 2.5 3 And the PDF of the Weibull distribution is: β β−1 −( t )β t e α αβ The hazard rate is: β β−1 h(t) = t αβ so, when: β < 1, h(t) is decreasing, so R(t) is increasing β > 1, h(t) is increasing, so R(t) is decreasing When β = 1, The Weibull Distribution is an exponential distribution, in fact if we substitute we obtain: t 1 F (t) = 1 − e( α ) , so 1 F (t) = 1 − e( α )t this is a exponential distribution with parameter α1 So, we already proved that if the distribution is negative exponential, with β = 1, h(t) is costant and equal to α1 31 6.6 Unrepairable systems Sometimes, we can have a complex system, where many components interact to achieve a goal. In this case, we can evaluate reliability by the knowledge of the reliability of the sub-systems. Therefore, definition of reliability is extended. We denote with: Xs = the system failure time Xi = the failure time of the i-th block Rs (t) the reliability of the whole system We can connect the components in many kind of structures 6.6.1 Series System components are connected in a way in which all of them are needed in order to a have a working system. Input Block A Block B Block C Output For example, if the Block A or the Block B or the block C is failed, the overall system is failed. Let’s evaluate the reliability. Rs (t) = P [Xs > t] = P [X1 > t, X2 > t,....Xi > t] and if we have statistically independent blocks (like for example a working CPU and a working memory connected in a series system), the reliability is: n Y Rs (t) = P [X1 > t] · P [X2 > t] ·....P [Xi > t] = Ri (t) i=1 Now we can see that the system reliability is less than the lowest sub-system reliability, because the product will be always less than we lowest factor, e.g 0.2 · 0.3 · 0.4 ≤ 0.2. This means that the most “fragile” subsystem, i.e, the component less reliable heavily weights on the system reliability. We can derive also the M T T F of the series system. Suppose Ri (t) = e−λi t R(s) = e−λ1 t · e−λ2 t ·..e−λi t = e−(λ1 +λ2 +..λi )t So the failure time of the series system is exponentially distribution, thus: 1 M T T Fs = λs 32 6.6.2 Reliability improvement Supposing we have the reliability of one or n components: R1 (t), R2 (t),...Rn (t) Consider we want to increse the reliability of the i-th component of a fixed quantity ∆Ri Rs (t) + ∆Rs = R1 (t) · R2 (t) ·..(Ri (t) + ∆Ri).. · Rn (t) resolving... Rs (t) ∆Rs = · ∆Ri Ri (t) ∆Rs Rs (t) = ∆Ri Ri (t) this means that the percentage of reliability is higher the lower is the reliability of the improved component? What does it mean? It’s better focusing on low-reliability components that leads to greater overall system improvements instead of focusing on high reliable components. 6.6.3 Redundant systems A system is said redundant when the failure of a component does not involve the failure of the entire system 6.6.4 Unreliability Let’s consider two components whose unreliability is F1 (t) and F2 (t). The unreliability of the overall system with n components is n Y Fs (t) = 1 − Rs (t) = (1 − Ri (t)) i=1 33 Capitolo 7 Markov Chains A markov chain is a stochastic process, a family of random variables defined over a sample space S. We said that a random variable is a function that maps a value of the sample space into a real number. In this case, the values of X(t) are called states and when X(t) changes its state a transition happen. Since the sample space can be discrete or continuos, the resulting stochastic process can be either discrete or continuos. 7.1 Markov Chain Let X(t) be the states of a discrete stochastic process P [X(tn ) = j] it’s the generic probability to be in the state j at time tn X(t) is a Markov Chain when: P [X(tn ) = j]|X(tn−1 = in−1 , X(tn−2 ) = in−2 ,..X(t0 = i0 )] = P [X(tn ) = j]|X(tn−1 = in−1 ] This is called the Markov property. This condition says that state of a Markov chain after a transition may (but does not have to) depend on the state immediately before it, but it cannot depend on any states before that. In other words, at the time of a transition, the entire past history is summarized by the current state. Another important property is the Homogenuosity. In a Homogeneuos Time Mar- kov Chain: P [X(t) = j|X(tn ) = in ] = P [X(t − tn ) = j|X(0) = in ] This means that we can choose any starting point we want because process dynamic does not depend on the time origin. Thus, throughout the evolution of the process through time, the transitions among states follow the same probability rules. This gives us another important consequence: the sojourn time is exponentially distributed because of the markov property. If we choose tn in such a way that the process has already been in the state j for some time, the probability of finding the system in state j at time t > tn depends on state j but not on how long the system has been in this state. 7.1.1 Equations The purpose of a “solving” a Markov chain is is to obtain the state probabilities both with finite values of t and t → ∞ (In particular, a Transient Analysis and a Steady-State 34 analysis) pi (t) = P [X(t) = i] For example, in a system where we have 2 states only (UP (1)and DOWN (2)) p1 (t) is the probability to be in the state ‘ÙP” at time t We define the transition state probability as: Pij (t − u) = P [X(t) = j|X(u) = i] it’s the probability we transit from state i to state j given that at time instant u we where in the state i, but most often we will have that u = 0 The matrix P (t) is the square matrix of the transition probabilities Pij (t). 7.1.2 Chapman Kolmogorov equations For the system to be in state j at time t, it must have been in some state i at time u and it must have made zero or more state transitions to reach j. This can be formalized as: p(t) = p(u) · P (t − u) (7.1) This is called Chapman Kolmogorov equations 7.2 Dicrete Time Markov Chain (DTMC) In a DTMC the time is treated as discrete. The state changes in discrete time instants {0, 1, 2,..} Let’s consider P = P (1) and let Pij denote the (i, j)-th element of P ; pij is called the one-step transition probability.. If the chain is homogenous, we can write Champan-Kolmogorov equations as: p(n + 1) = p(n) · P => p(n) = p(0) · P n If we apply the limit on both sides to the first formula and limn→∞ p(n) exists: lim p(n + 1) = lim p(n) = n→∞ n→∞ p=p·P The following system of equations is used for computing the limiting probability vector (state probabilities at steady-state). Unluckily, in this sysem the number of independent equations is one less than the number of equations (we have ∞1 solutions). So we need another equation, that is called the normalization equation: p·e=1 We impose that the sum of all probabilities must be equal to 1, where e is a column vector, e = (1, 1,....1)T where T denotes the transpose. p · e = 1 =>   1 [p0 , p1 ,...pn ] ·.. = 1  1 35 7.3 Continuos Time Markov Chain Now we turn to the case of continuous time Markov chains (time is a continuos variable). If we let u = t − ∆t and subtract p(t − ∆t) from both sides of 7.1, we get p(t) − p(t − ∆t) = p(t − ∆t) · [P (∆t) − I] dividing both sides by ∆t p(t) − p(t − ∆t) [P (∆t) − I] = p(t − ∆t) · ∆t ∆t if we apply the limit of both sides p(t) − p(t − ∆t) [P (∆t) − I] lim = lim p(t − ∆t) · ∆t→∞ ∆t ∆t→∞ ∆t The quantity [P (∆t) − I] lim (7.2) ∆t→∞ ∆t is called Q matrix Since the first part of the equation is a known quantity, it’s the derivative of p(t), we obtain the following differential equation: dp(t) = p(t) · Q (7.3) dt Like before, we can solve the markov chain at steady-state. When the limt→∞ p(t) exists, we can apply the limit in the equation 7.3 and the derivative becomes zero (because p(t) is costant) and we get the following system of linear equations for the limiting probabilities: 0=p·Q But like before, this has ∞1 solutions, so we need the normalization equation in order to have a solution. ( 0=p·Q p·e=1 If we want to perform the transient analysis, we need to solve the differential equation 7.3 in order to obtain all the states probabilities at each istant t 7.3.1 Q matrix The Q matrix is defined as [P (∆t) − I] Q = lim ∆t→∞ ∆t so we are saying that each element qij is [P (∆t) − δij ] Q = lim ∆t→∞ ∆t 36 where ( 0 i ̸= j δij = 1 i=j The definition of Q implies that, for asmall interval ∆t, we have qij ∆t ≈ Pij (∆t) − δij So: If i = j, δij = 1 qij ∆t ≈ Pij (∆t) − 1 −qij ∆t ≈ 1 − Pij (∆t) In this case, the quantity qij ∆t is a probability, so qij is a rate. In particular, it’s the exit rate from state i. Since the quantity is a probability, it must be non-negative, so qij must be non-positive (can be negative or at least be 0 If i ̸= j qij ∆t ≈ Pij (∆t) qij is the transition rate from state i to state j Q is named transition matrix or infinitesimal generator matrix. An important property is that X Pij (∆t) + Pii (∆t) = 1 j,j̸=i This implies that the sum of all rates in a row in Q must be equal to zero X qij = 0 j The off diagonal elements can be positive or 0 The diagonal elements are non-positive (can be 0 or negative) 7.4 State classifications Transient state = a state which has finite number of visits over a infinite time Recurrent state = a state that is not transient (the process will return to this state with probability 1) Absorbing state = a state that once entered, cannot be left 37 7.5 Markov chains characterizations A Markov chain is said irreducible if each state is reachable from each other, it has all recurrent states 0 1 2 A Markov Chain is said acyclic if a state exists such that it cannot be visited anymore when it has been left (no loops) A CTMC is ergodic if it is irreducible and recurrent non null 7.6 Excercises 7.6.1 The Land of OZ Let’s consider a town (The Land of OZ) in which we have only three possible weather conditions: Sunny (S) Cloudy (C) Raining (R) Let us assume the weather conditions of the next day depends on the actual weather conditions only. Based on historical data, we know that the following situations can happen: If today is sunny, tomorrow: Sunny (70%), Cloudy (20%), Raining (10%) If today is cloudy, tomorrow: Sunny (30%), Cloudy (50%), Raining (20%) If today is rainy, tomorrow: Sunny (20%), Cloudy (60%), Raining (20%) We want to know: The weather conditions the day after tomorrow Which is the most probable weather condition? 38 How many sunny days per year are in the land of OZ? We can use a DTMC (Discrete Time Markov Chain) because we assume that the weather conditions of the next day depends on the actual weather conditions and not from other days in the past. We can model the system with a discrete time since we can enumerate the days (Day 1, Day 2..) Let’s depict our model: 0.7 S 0.3 0.1 0.20.2 0.2 0.5 C R 0.2 0.6 Now we can try to solve the chain using the equations. Just for istance, we enumerate the states using indexes. S = 0, C = 1, R = 2 We probability matrix is   0.7 0.2 0.1 P = 0.3 0.5 0.2 0.2 0.6 0.2 These are the probabilities that tomorrow the system will transient in one of the three states. But what about the day after tomorrow? If we represent with n the count of the days (n = 0 means today, n = 1 means tomorrow, and so on) we can argue that  2  0.7 0.22 0.12 P · P = P 2 = 0.32 0.52 0.22  0.22 0.62 0.22 these represents the state probabilities at discrete time n = 2 (meaning the day after tomorrow). For finding which is the most probable weather condition we have to solve the chain. This because the steady state probability may be interpreted as the fraction of time spent in the state. For n → ∞ we can get a good approximation of the weather conditions in the Land Of Oz. We need to solve the following system π =π·P π·e=1 39 This is the basic procedure and the typical solution adopted in MATLAB too. π·P −π =0 π · (P − I) = 0 for the matrix transpose property (P − I)T · π T = 0 if we set A = (P − I)T , X = π T We need to solve AX = 0 π·e=1 For answering the second question, we need to compute the state probabilities, we will obtain the state probability vector and we extract the p0 element For finding the number of sunny days in the land of Oz we can multiply the number of days in a year 365 for the probability p0 since we gave the interpetation of the probability as the time spent in that weather condition Ns = p0 · 365 Solving we obtain p0 = 0.475 Ns = 0.475 · 365 ≈ 173 days (almost six months) 7.6.2 Two processors system with shared memory We have Two processors One memory module configured into two separate partitions (shared between proces- sors) The access time to the memory is costant (meaning that we don’t consider the sojourn time) It is possible to access to different partitions at the same time The two processors behave synchronously We have a important costraint: A request is done immediately when the previous one is completed. So if q1 is the probability to have a request for partition 1, q2 is the probability to have a request for partition 2 40 Which is the probability a processor wait for the memory in stable condi- tions? We can use a DTMC model where we use the state (i, j) where i is the number of request for partition 1 and j is the number of request for partition 2. q2 0,2 2q1 q2 q22 q1 1,1 q12 q2 2,0 q1 Explanation: When we have 1 request for the first partition and 1 request for the second partition, in this case we have no conflict. But let’s suppose a processor make a request for the other partition 1 or 2. If one processor make a request for the partition 2 (transiting in state (0, 2), in this case we have two processors making the request for the same partition: so the probability is the probability that we have a request for partition 2 and a second request for partition 2. The “and” probability translate into a moltiplication between probabilities, so the probability to transient from state (1, 1) to state (0, 2) is q2 · q2 = q22. The same argument can be done for state (2, 0). The probability to transit into this state is q12. Let’s consider when we have 2 request for the same partition and a processor make a request for the other. In this case, we have no conflict because the processor will request a partition that is not yet requested, so the probabilities are simply q1 and q2 from state (0, 2) and (2, 0). Let’s consider we are in state (1, 1) and the processor proceeds to request the same partition they already requested, i.e, from state (1, 1) to state (1, 1). In this case we have to consider the probability that a processor make a request for the first partition and the other for the second parition or viceversa. The probability that a processor make a request for the first partition and the other processor make a request for the second partition is q1 · q2 while the viceversa situation has a probability of q2 ·q1. Since the two conditions can happen, it’s a logical “OR”. So the final probability is q1 · q2 + q2 · q1 = 2q1 q2. We have not outgoing arcs from (2, 0) → (0, 2) or (0, 2) → (2, 0) because we are assuming that a request is done immediately when the previous one is completed so we cannot have 2 requests at the same time. 41 7.6.3 Two processors and three memory modules We have Two processors Three shared memory modules Each module has a dedicated repair facility 42 Capitolo 8 Queues 43 Capitolo 9 Case of study: M/M/1 queue 44

Dependable Computing Modelling and Simulation PDF

Document Details

Tags

Related

Summary

Full Transcript