Optimal Designs ST410 2023 PDF

ST410 Designed Experiments 2023 ST410 Chapter A: Optimal Experimental Design Contents A Introduction 2 A.1 Design Regions................................. 2...

ST410 Designed Experiments 2023 ST410 Chapter A: Optimal Experimental Design Contents A Introduction 2 A.1 Design Regions................................. 2 A.1.1 Design Coding............................. 3 A.2 Aims of a Good Design............................. 4 A.3 The General Linear Model (Regression) in the design context........ 5 A.3.1 Regression formulation of an Experimental Design.......... 5 A.4 Standardised Variance............................. 9 A.5 Confidence Region of Parameter Estimates.................. 13 A.6 Optimal Design Criteria............................ 18 A.6.1 Criteria that aim to reduce the standard error of parameters.... 18 A.6.2 Measure Theory Approach....................... 20 A.6.3 Criteria that aim to reduce the standard error of the predictions.. 23 A.6.4 Which Optimality criteria?....................... 25 A.7 The General Equivalence Theorem...................... 26 A.7.1 Application of the General Equivalence Theorem.......... 28 A.8 Algorithms for Designs............................. 34 1 ST410 A Introduction Often we want to pick a design so that it is in some way efficient and optimal. However what do we mean by the idea of an optimal design? What do we mean by an efficient design? The aim of this section is to develop some of these ideas. A.1 Design Regions Suppose we wish to design an experiment to estimate the parameters of a multiple linear regression model: yi = β0 + β1 u1,i + · · · + β1 um,i + εi (i = 1,... , n) (1) It is usual in a design for the values of uj (j = 1,... , m) to have some boundary limits. For example in an industrial process there will be safety limits on the maximum pressure or temperature that can be applied and similarly for minimum pressure and temperature. i.e. in general uj will be bounded so that uj,min ≤ uj ≤ uj,max (j = 1,... , m). We may write this as uj ∈ [umin , uj,max ] (j = 1,... , m). As we wish to develop design ideas irrespective of the scale and the particular limits of a variable it is usual to rescale the continuous variables. (Although for some optimality criteria the design can be scale dependent.) Usually we scale (or standardise) the continuous variables so that they lie between -1 and 1: uj − uj,mid xj = (j = 1,... , m), (2) SRi where uj,mid is the mid-point of umin and uj,max , and SRi is the semi-range of uj i.e uj,min + uj,max uj,max − uj,min uj,mid = ; SRj = ; (j = 1,... , m). (3) 2 2 We will from now on discuss the coded variables xj ∈ [−1, 1] (j = 1,... , m). Once we have results and wish to interpret a model the inverse transformation is just: uj = uj,min + xj SRj (j = 1,... , m). (4) This is of course what we saw throughout the module and in particular saw in week 10 when we looked at response surface designs. Once we know the limits of the variables and have coded them so that the limits are now -1 and 1 on each of the coded xj we have to note if there are any further restrictions. For example, sometimes the limits do not apply independently. For example we may have the restriction: m X x2j ≤ r, (5) j=1 where r is the radius of a sphere. In this sort of experiment the design region indicates an equal interest in departures in any direction from the centre of the sphere. Some 2 ST410 experiments might be considering the components in a mixture. e.g. the % of sand, clay and silt in soil. This would lead to the constraints : m X xj = 1 and xj ≥ 0 (j = 1,... , m), (6) j=1 which represents x ∈ Sm−1 the m-1 dimensional simplex. We denote these design regions as X. For the unrestricted design, the region is simply: xj ∈ [−1.1] ∀ j = 1,... m. (7) We may draw each of these regions X for m=2 (m=3 for the simplex) as shown below: P3 P2 j=1 xj = 1 and xj ∈ [−1.1] ∀ j = 1, 2 2 j=1 xj ≤ r xj ≥ 0 (j = 1, 2, 3) (a) Unrestricted (b) Equal departure from centre (c) Simplex Figure 1: Examples of two dimensional design regions X A.1.1 Design Coding We have already seen earlier in the module how coding is applied to designs, for example factorial designs. When we have continuous variables that are bounded and then rescaled as in equation (2) so that they now follow equation (7),then when we specify design points as for example {-1,0,1} then this means that a continuous variable will take three values -1 i.e. its minimum value, 1 its maximum value and 0 the point mid-way between the maximum and minimum. Of course these may not be the actual minimum or maximum of settings in e.g. a process, but the maximum and minimum we are considering in a particular design. It depends on the context. In response surface optimisation type situations we run several experiments and then move the settings to find the optimum; in such a setting the maximum and minimum refer to the maximum and minimum under consideration for that particular run. In other contexts such as when we know the process limits, the maximum and minimum are the actual maximum and minimum available. 3 ST410 A.2 Aims of a Good Design Box and Draper (1975) and (Box and Draper, 1987, Chapter 14) list 14 aims in the choice of an experimental design. These are summarised in (Atkinson and Donev, 1992, pages 47-48): as “ 1. Generate a satisfactory distribution of information throughout the region of interest, which may not coincide with the design region X. 2. Ensure that the fitted value, ŷ(x) at x, be as close as possible to the true value and the expected response E[y(x)] = η(x) at x. 3. Make it possible to detect a lack of fit. 4. Allow estimation of transformations of both the response and the quantitative experimental factors. 5. Allow experiments to be performed in blocks. 6. Allow designs of increasing order to be build up sequentially. ” For example to have a screening design and then have follow-up experiments that consider a different design region previously not thought important. Or to start with a first order design (e.g. a 2m factorial experiment - where only linear terms are estimated ) and then add design points to estimate a second order model (i.e. with x2j (j = 1,... , m) terms in the model.) “ 7. Be insensitive to wild observations and to violation of the usual normal theory assumptions. 8. Require a minimum number of experimental runs. 9. Provide simple data patterns that allow ready visual appreciation. 10. Ensure simplicity of calculation. 11. Behave well when errors occur in the settings of the experimental variables. 12. Not require an impractically large number of levels of the experimental variables. 13. Provide a check on the ”constancy of variance” assumptions. ” And for experiments with quantitative factors in addition: 14. “ Orthogonality: the designs have a diagonal information matrix leading to uncorrelated estimates of the parameters. 15. Rotatability: the variance of ŷ(x) depends only on the distance from the cnetre of the experimental region.” 4 ST410 Many of these concepts were addressed in the main part of the module. What we consider next is understanding the relationship between the design and the variance of the parameters and the predicted value. For the parameters we really mean the covariance or dispersion matrix and not just their individual variance. In many designs the parameter estimates will be correlated. Once we understand this relationship we may then develop some ideas of optimal designs. Some of the optimal criteria address some of the issues, for example once we understand the relationship between the design and the covariances we can create a design to address item 8 above. If we require blocks as in item 5 we can use this understanding to help us create suitable designs. Some of these concepts will be more important than others. For example, the last two items (14 and 15) might be of interest if there are quantitative factors, whilst with computing power often 10 is of less importance. A.3 The General Linear Model (Regression) in the design context You should be familiar with (e.g. ST221. ST952, ST903 and slides 6.1 on this module) the fact that the matrix regression equation: y = Xβ + ε with ε ∼ N (0, σ 2 I), (8) has least square estimates: −1 β̂ = (X ′ X) X ′ y, (9) where: −1 β̂ ∼ N [β, σ 2 (X ′ X) ]. (10) As a result, β̂j /s.e.(β̂j ) ∼ tn−p−1 where s.e.(β̂j ) = σ̂cjj and cjj is the jth diagonal element of (X ′ X)−1 and σ̂ is estimated as the square root of the residual mean square. In the experimental design context the design matrix X for data coded as in A.1.1, consists of a column of 1’s and then columns representing the different design points in X (as defined in A.1). However, this matrix may also need some additional columns to represent interactions and polynomial terms (usually squared terms.) For this reason the regression equations above are in many design text books reformulated to accommodate this. A.3.1 Regression formulation of an Experimental Design Suppose, as above, we have a single response variable which we will denote by y, and that we have p explanatory variables x1 ,... , xm. We can model the dependence by the following response function y = f ′ (x)β (11) where f ′ is the polynomial expansion of the explanatory variables x = (x1 ,... , xm ) and is of dimension 1 × p and β is the p × 1 vector containing the parameters of the explanatory variables. 5 ST410 Example A.1. Suppose we have two factors x1 and x2 1. If we wish to fit the first order model: y = β 0 + β 1 x1 + β 2 x2 + ε then f ′ (x) = (1, x1 , x2 ), m = 2 and p = 3. 2. If we wish to fit the model with interaction: y = β0 + β1 x1 + β2 x2 + β12 x1 x2 + ε then f ′ (x) = (1, x1 , x2 , x1 x2 ), m = 2 and p = 4. 3. If we wish to fit the full second order model: y = β0 + β1 x1 + β2 x2 + β12 x1 x2 + β11 x21 + β22 x22 + ε then f ′ (x) = (1, x1 , x2 , x1 x2 , x21 , x22 ), m = 2 and p = 6. where - 1 is the n × 1 vector of 1’s, - x1 is the n × 1 vector of observations for x1 i.e. x1j , i = 1,... n = (x11 , x12 ,... , x1n )′ , - x2 is the n × 1 vector of observations for x2 i.e. x2j , i = 1,... n = (x21 , x22 ,... , x2n )′ , - x1 x2 is the n × 1 vector of observations for x1 x2 i.e. the vector of cross-products, i.e. x1j x2j , i = 1,... n = (x11 x21 , x12 x22 ,... , x1n x22 )′. - Similarly, x21 is the n × 1 vector of squared observations for x1 i.e. x21j , i = 1,... n = (x211 , x212 ,... , x21n )′ , and - x22 is the n × 1 vector of observations squared for x2 i.e. x22j , i = 1,... n = (x221 , x222 ,... , x22n )′ , Clearly the full design matrix X given in equation (8) and used in equations (9) and (10) when compared to (11) can be seen as: X = f ′ (x) (12) Also the j th observation corresponds to the j th row of X is f ′ (xj ). Observations are subject to a random error which we denote by ε, and we add this to y in the statistical model. So, the j th experimental observation can be written as yi = f (xj )β + εi , i = 1,... , n 6 ST410 Here, xj is referred to as the design point, and represents the settings of the explanatory variables in the j th run of the experiment. We can use a full matrix notation equation (8) to express the complete set of n (say) observations. The assumption of independence of errors is appropriate in experiments where the levels of the factors are re-set independently after each run, and the run order has been randomised. This is the case in many designs such as the completely randomised design which we have already met. The further assumption, that the errors are normally distributed as shown above in equation (8) is necessary for tests of significance to be valid. The least squares estimates given in equation (9) are unbiased estimates of the unknown parameters β and are equivalent to the maximum likelihood estimator when the error structure is normal. The variance-covariance matrix, is from (10): Var(β̂) = σ 2 (X ′ X)−1 , (13) and consequently the information matrix on the unknown fixed parameters β is M = σ −2 (X ′ X) Once we have estimated the parameters of the model, it may be important to predict the response for different combinations of x. The predicted response is given by ŷ(x) = f ′ (x)β̂ (14) with variance Var(ŷ(x)) = σ 2 f ′ (x)(X ′ X)−1 f (x). (15) The variance component σ 2 has to be estimated; an unbiased estimator is the residual (also known as the error) mean square. σ̂ 2 = r ′ r/(n − p) with r the n-dimensional vector of residuals ri = yi − f (xi )β̂. This is the residual mean square produced in standard ANOVA tables for the analysis of such designs. Note that the properties of the parameter estimates are affected by the settings of the explanatory variables. One of the purposes of statistical design is to determine those settings that produce the best possible estimates. We have seen some aspects of selecting design points throughout the module. For example, using combinations of points at -1 and 1 for different factors to estimate different interactions. However optimal design theory also considers how to find the best design points to make the estimation of parameters and/or predictions optimal in some other fashion. We will cement these ideas with some further examples: 7 ST410 Example A.2. Suppose we wish to fit a simple linear regression model: y = β0 + β1 x and we can run only n = 3 runs. We are assuming that X = [−1, 1]. Possible designs, δ include:   −1 1. δ1 = x =  0 1   −1 2. δ2 = x =  1  1 For these designs we can write down the full design matrix:   1 −1 1. X = 1 0  1 1   1 −1 2. X = 1 1  1 1 The information matrix will be: 1 ′ 1 3 0 1. σ2 X X = σ2 0 2 1 ′ 1 3 1 2. σ2 X X = σ2 1 3 The variance covariance matrix for β̂ will be: Var(β̂) = σ 2 (X ′ X)−1 2 ′ −1 2 1/3 0 1. σ (X X) = σ 0 1/2 2 ′ −1 2 3/8 −1/8 2. σ (X X) = σ −1/8 3/8 The variance for a predicted value of x will have f ′ (x) = (1 x) with Var(ŷ(x)) = σ 2 f ′ (x)(X ′ X)−1 f (x): 2 ′ ′ −1 1/3 0 1 x2 1. Var(ŷ(x)) = σ f (x)(X X) f (x) = σ 1 x 2 = σ 2 13 + 2 0 1/2 x 2 ′ ′ −1 2 3/8 −1/8 1 2. Var(ŷ(x)) = σ f (x)(X X) f (x) = σ 1 x = −1/8 3/8 x σ2 8 (3 − 2x + 3x2 ) 8 ST410 In Example A.2 we have two different designs, both with different information matrices and hence different variance -covariance matrices for β̂ and different functions for Var(ŷ(x)). It is by comparing such results that we can discern which designs might be optimal in some sense. A.4 Standardised Variance In equation (15) and as seen in the example above (Example A.2) the variance of a prediction will involve σ 2 , which when comparing possible designs for the same situation should be the same. Also we may wish to compare designs with differing numbers of design points, n. It is convenient to therefore standardise the variance function for a prediction. Definition A.1 (Standardised Variance). We define the standardised variance of a design, δ, with design points defined by {x} ∈ X as Var(ŷ(x)) n (16) σ2 Have a go at the following exercise. Exercise A.1. Write down the information matrix for simple linear regression with n design points, and find the algebraic form of the variance-covariance matrix. Consider the following designs: δ1 = {−1, 0, 1}, 1 1 δ2 = {−1, − , 0, , 1}, 2 2 δ3 = {−1, 1}, 1 1 δ4 = {−1, − , , 1}, 3 3 δ5 = {−1, −1, 1}. Calculate the standardised variance for all five designs, and sketch the responses over the design region. Comment on your results. The solution follows on the next page, but you are strongly encouraged to first have a go at these examples yourself before looking at the solution. 9 ST410 Solution A.1. For simple linear regression, the design matrix X will consist of a column of 1’s followed by the column of x values. So,  P  n xi ′ i X X = P , (17) x2i P xi i i 1 ′ and the information matrix is M = σ2 X X. ′ P 2 |X X| = n x2i − xi = n (xi − x̄)2 , so that the variance-covariance matrix of P P i i i β̂0 , β̂1 is  PP  2 x2i − xi ′ σ σ 2 (X X)−1 = ′  iP i . (18) |X X| − xi n i Once we have estimated the parameters of the model, it may be important to predict the response for different combinations of x. The predicted response is given by ŷ(x)) = f ′ (x)β̂, with variance Var (ŷ(x)) = σ 2 f ′ (x)(X ′ X)−1 f (x). And for simple linear regression f ′ (x) = 1 x. Now we find the standardised variance for the designs listed:   1 −1 1. δ1 = {−1, 0, 1} → X = 1 0 . 1 1 We can either use the formula given by (17) and (18) or work these  out from  first 1 −1 1 1 1  principles from the specific design matrix. X ′ X = 1 0 = −1 0 1 1 1 3 0 1/3 0 , |X ′ X| = 6, (X ′ X)−1 =. 0 2 0 1/2 1/3 0 1 2 2 Var(ŷ(x)) = σ 1 x = σ 2 31 + x2 , 0 1/2 x 3 so, since n = 3 the standardised variance = σ2 V ar(ŷ(x)) = 1 + 3x2 /2. 10 ST410   1 −1 1 − 1   2 2. δ2 = {−1, − 21 , 0, 12 , 1} → X =   1 0 .  1 1  2 1 1   1 −1 1 − 1  1 1 1 1 1  2 5 0 ′ XX=  1 0 = . −1 − 12 0 12 1  1 1  0 2.5 2 1 1 ′ ′ −1 1/5 0 |X X| = 12.5, (X X) =. 0 2/5 1/5 0 1 2 = σ 2 15 + 2x5 , Var(ŷ(x)) = σ 2 1 x 0 2/5 x so, since n=5, the standardised variance = 1 + 2x2. 1 −1 3. δ3 = {−1, 1} → X =. 1 1 ′ 1 1 1 −1 2 0 ′ ′ −1 1/2 0 XX= = , |X X| = 4, (X X) =. −1 1 1 1 0 2 0 1/2 1/2 0 1 2 Var(ŷ(x)) = σ 1 x2 = σ 2 21 + x2 , 0 1/2 x so the standardised variance (n = 2) = 1 + x2.   1 −1 1 − 1  4. δ4 = {−1, − 31 , 13 , 1} → X =  1 1 . 3 3 1 1   1 −1 1 − 31  ′ 1 1 1 1  4 0 XX=   =. −1 − 13 13 1 1 13  0 20/9 1 1 ′ ′ −1 1/4 0 |X X| = 80/9, (X X) =. 0 9/20 1/4 0 1 2 Var(ŷ(x)) = σ 1 x2 = σ 2 41 + 9x20 , 0 9/20 x so the standardised variance (n = 4) = 1 + 59 x2. 11 ST410   1 −1 5. δ5 = {−1, −1, 1} → X = 1 −1. 1 1   1 −1 ′ 1 1 1  3 −1 XX= 1 −1 =. −1 −1 1 −1 3 1 1 ′ ′ −1 3/8 1/8 |X X| = 8, (X X) =. 1/8 3/8 3/8 1/8 1 3x2 = σ 2 38 + x Var(ŷ(x) = σ 2 1 x 4 + 8 , 1/8 3/8 x so the standardised variance (n = 3) = 38 (3 + 2x + 3x2 ). The standardised variances for the five designs are plotted below: Figure 2: Comparison of Standardised variances for 5 different designs All except δ5 are symmetric designs, and all have minima at x = 0. δ3 is uniformly the best. δ5 is asymmetric and so has better variance at its weighted end. Note that spreading out the points, as in δ1 , δ2 and δ4 gives poorer precision, i.e. for a straight line put the design points at the limit of the design space. The code to produce the plot can be found on Moodle called StanVareg.R 12 ST410 The above example demonstrates how we can compare different potential designs across the design space. This considered the standardised variance, and from it we can see that some designs will result in smaller variance of the predictions. Clearly this is desirable. We may ask the question as to whether it is possible to obtain an even smaller standardised variance than the smallest one here. Also what do we mean by the smallest? Clearly for the symmetric designs as mentioned above δ3 is optimal out of this set, whilst δ5 may have some advantages in some situations. This will lead us to define some optimality criteria. Before we do so, we note that the above was based on the variance of predictions, but we can also look at the variance of the parameter estimates. We will do this before defining some optimality criteria and finding out a little about how we decide if a design is optimal. A.5 Confidence Region of Parameter Estimates Recall: Var(β̂) = σ 2 (X ′ X)−1 , (13) From this we can define a (100 − α)% confidence region for the parameters: (β − β̂)′ (X ′ X)(β − β̂) < ps2 Fp,ν,α (19) where s2 is the estimate of σ 2 based on ν degrees of freedom, and Fp,ν,α is the α percentage point from an F distribution with p and ν degrees of freedom. Now we can compare the size of confidence regions using equation (19), but for different designs for the same process (i.e. the same situation) and for fitting the same model the term on the right hand side is a constant. Let us w.l.o.g. (without loss of generality) therefore consider some regions where: (β − β̂)′ (X ′ X)(β − β̂) < 1 (20) Example A.3. Consider the following designs for the simple regression model: y = 3 + 5x: 1. δ1 = {−1, 1}, 2. δ2 = {−1, −1, 1, 1} 2 replicates of δ1 , 3. δ3 = {−1, −1, −1, 1, 1, 1} 3 replicates of δ1 , 4. δ4 = {−1, −1, −1, −1, 1, 1, 1, 1} 4 replicates of δ1. Plotting these design parameter regions as given in equation (20) yields the following: 13 ST410 Figure 3: Plot of contour at (β − β̂)′ (X ′ X)(β − β̂) < 1 for different symmetrical designs. It can be seen that the contour plot is a perfect circle because of the symmetry in the design points (equal number of -1’s and 1’s). Further as we increase the number of replicates the radius of the contour decreases. This is not surprising because as we estimate the parameters with more replicates their standard errors and consequently this confidence region will decrease. The standard error of both parameters will be the same, and the parameter estimates are uncorrelated. These designs are also orthogonal. Next we can consider some further designs, first where the standard errors are not the same, and see what happens to the contour plots. We will consider designs of size n = 3q, q = 1, 2, 3, 4; with a centre design point included. 14 ST410 Example A.4. Consider the following designs, again for the simple regression model: y = 3 + 5x: 1. δ1 = {−1, 0, 1}, 2. δ2 = {−1, −1, 0, 0, 1, 1} 2 replicates of δ1 , 3. δ3 = {−1, −1, −1, 0, 0, 0, 1, 1, 1} 3 replicates of δ1 , 4. δ4 = {−1, −1, −1, −1, 0, 0, 0, 0, 1, 1, 1, 1} 4 replicates of δ1. Again, plotting these design parameter regions as given in equation (20) yields the following: Figure 4: Plot of contour at (β − β̂)′ (X ′ X)(β − β̂) < 1 for different symmetrical designs with centre points. 15 ST410 It can be seen that the contour plot is no longer a perfect circle, but it is symmetrical about the horizontal and vertical axes. Again as we increase the number of replicates the radius of the contour decreases. Now the parameter estimates have different standard errors, the standard error for β̂1 being larger than for β̂0. The designs are still orthogonal and hence β̂1 and β̂0 are uncorrelated. All the designs in examples A.3 and A.4 are for symmetrical designs. What happens when the designs are not symmetrical? We already understand what happens if the sample size changes but the design structure is essentially the same. Now we will consider designs of size n = 9 but with different assymetrical design structures. We retain one symmetrical design for comparison: Example A.5. Consider the following designs, again for the simple regression model: y = 3 + 5x: 1. δ1 = {−1, −1, −1, 0, 0, 0, 1, 1, 1}, 3 replicates of {−1, 0, 1} 2. δ2 = {−1, −1, −1, −1, −1, −1, −1, 1, 1}, 3. δ3 = {−1, −1, 1, 1, 1, 1, 1, 1, 1}, 4. δ4 = {−1, −1, −0.9, −0.85, −0.8, −0.75, −0.7, −0.65, 1}, 5. δ5 = {−1, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 1, 1}, 6. δ6 = {−1, −0.9, −0.8, −0.7, −0.55, −0.4, −0.25, −0.1, 1}, 7. δ7 = {1, 0.9, 0.8, 0.7, 0.55, 0.4, 0.25, 0.1, −1}. As before we plot the design parameter region as given in equation (20). This yields the following: 16 ST410 Figure 5: Plot of contour at (β − β̂)′ (X ′ X)(β − β̂) < 1 for different asymmetrical designs. It can be seen that the contour plots for δ2 through to δ7 are no longer perfect circles, nor are they symmetrical about the axes. The standard errors for designs δ1 , δ2 and δ3 for both parameters appear identical, but in δ2 and δ3 the parameter estimates are now correlated as shown by the sloping nature of the confidence region. This is more pronounced in the remaining designs which all have larger standard errors for the parameters. All designs apart from δ1 are no longer orthogonal. 17 ST410 A.6 Optimal Design Criteria Above we have seen that changing design points can lead to: 1. Changing the standardised variance function given by definition A.1 and equation (16) of the predicted values. 2. Changing the size and shape of the confidence region for the parameters. In our examples and the exercises above we wish to pick a design that in some sense has the smallest standardised variance (definition A.1 and equation 16) i.e. as discussed above pick the design which in some way has the lowest trace in diagrams such as in figure 2. Similarly we might want instead (or ideally as well) pick a design that reduces the standard error of the parameters. We can think of picking designs where the design region as seen in examples A.3, A.4 and A.5 is as small as possible. This idea leads to some optimal design criteria. A.6.1 Criteria that aim to reduce the standard error of parameters From equation (13): recall: Var(β̂) = σ 2 (X ′ X)−1 , (13) the standard errors of the parameter estimates. These are dependent on the (X ′ X) matrix and σ 2. The size of σ 2 depends on the size of the natural variation in the system being modelled - something that we cannot directly control. (Although of course we can create a design that follows good design principles such as using blocks to make runs as homogeneous as possible etc.) But we can choose the design points which will define the design matrix and hence the information matrix/variance-covariance matrix. Designs which are based on reducing the parameter estimate standard errors are therefore based on picking design points that optimise some function of the matrix (X ′ X). Similar to the standardised variance function of definition A.1 we may also define a generalised variance that is independent of σ 2 but for the parameters: Definition A.2 (Generalised Variance). 1 The generalised variance of the parameter estimates in a design is given by |X ′ X|. Let λ1 , λ2 ,... , λm be the eigenvalues of (X ′ X). Definition A.3 (D-Optimal Design). A design is D-optimal (D for determinant) if the determinant of the matrix (X ′ X) is maximised. i.e. if (|X ′ X|) is maximised over the design space X. 18 ST410 1 Hence a design is D-optimal if the generalised variance, |X ′ X| , of the parameter estimates is minimised. D-optimal criteria is equivalent to: p Y 1 min (21) λ i=1 i It should be easy to see the rationale for this criteria. From equation (19) the size of the confidence region is inversely proportion to the determinant of (X ′ X), so maximising this will minimise this region. Because the determinant of a matrix is the product of its eigenvalues we can define D-optimal designs in terms of the eigenvalues, which yields equation (21). D-optimality is one of the most highly used criteria. Related to it are two further criteria: Definition A.4 (A-Optimal Design). A design is A-optimal if we minimise the sum (or average) of the variances of the parameter estimates. This can be seen to be equivalent to p ′ −1 X 1 min tr((X X) ) ≡ min (22) λ i=1 i In terms of the examples above, we can see that a D-optimal design would be one that has the smallest area depicted by the regions plotted in figures 3, 4 and 5, whilst an A-optimal design would seek to reduce the width of design region in the direction of the parameters i.e. in the horizontal and vertical directions of these same figures (3, 4 and 5). The final criteria for consideration is to think of reducing the poorest estimate. Definition A.5 (E-Optimal Design). An E-optimal design is where we minimize the variance of the least well-estimated contrast α′ β where α′ α = 1. i.e.: E-optimal criteria is: 1 min max (23) λi If the eigenvalues are of similar size we will have confidence regions nearer to the circles (or spheroid in more dimensions) of example A.3 and figure 3, whilst more unequal 19 ST410 eigenvalues represent designs that are more elongated ellipsoids such as in A.5 and figure 5. When we have the ellipsoids some linear combinations of parameters are estimated with smaller variance that others. The E-optimality criteria represents a way to prevent the large elongated ellipsoids. An ideal design (assuming all parameters are equal interest) will therefore not only have the largest eigenvalues but have them as near equal as possible. All three of these criteria are capturing some aspect of this requirement. A.6.2 Measure Theory Approach It can be seen that all three criteria are special cases of the general criterion for choosing a design to minimise: p !1/k 1 X −k Ψk (ξ) = λ (24) p i=1 i where ξ is a measure over X, and k = 1, 0, ∞ for A-, D- and E- optimality respectively. If you are not familiar with measure theory, do not be put off by the notation here. If a design has m distinct points in X then: x1 x2... x m ξ= (25) w1 w2... w m where the first line, x1 , x2 ,... , xm give the values of the factors at the design points and the corresponding wi give the associated design weights. Example A.6. A design for the first-order model such as seen in example A.3 would be: −1 1 ξ= 1/2 1/2 which means that half the design points are at -1 and half are at 1. If n is even then we would put half the design points at -1 and the other half at 1. All designs δ1 , δ2 , δ3 and δ4 in example A.3 are examples of of this design. If n is odd we could not use this design exactly. Exercise A.2. Write down the form of ξ for the designs in example A.4. 20 ST410 Solution A.2. All the designs in example A.4 are either δ1 = {−1, 0, 1} or replicates of this, so there are three design points: −1, 0 and 1 that each appear an equal number of times. Hence: −1 0 1 ξ= 1/3 1/3 1/3 The detailed theory of optimal designs uses measure theory, and using this idea it is possible to find some easy methods to develop an optimum design. Due to time constraints we will not go into detail, but pick some useful results. If we use a measure for a design we can find an optimal design providing we depart from requiring the number of design points to be integers. Usually once we have found the optimal design, we would then need to search the possible integer solutions near those that are non-integer solutions. There are many algorithms available in R that permit this. The idea of breaking the requirement that the number of runs must be integers leads to the following definitions. Definition A.6 (Continuous or Approximate Design). A design in which the distribution of trials over X is specified by a measure ξ, regardless of n, is called a continuous or approximate design. Atkinson and Donev (1992) Definition A.7 (Exact Design). A design for a specified number of trials, n, is call an exact design. For an exact design we may write the measure ξ that represents it as: x1 x2... x m ξ = s1 s2. n n... snm m P where si is the integer number of trails and si = n. i=1 The names might be slightly confusing as the term exact, suggests we need the exact (possibly non-integer) number of runs, but remember here it called exact because instead it is a design that we describes what we exactly (can) carry out. An approximate design is one that whilst having “exact” weights, can not actually be 21 ST410 conducted so is an approximation to the true design to be carried out. Of course an an exact design may coincide with the approximate design as we saw above in example A.6 and in exercise A.2 above. We clarify this for example A.6. Example A.7. Consider example A.6 −1 1 ξ= (26) 1/2 1/2 is an approximate or continuous design, whilst the designs in example A.3 1. δ1 = {−1, 1} 2. δ2 = {−1, −1, 1, 1} 2 replicates of δ1 3. δ3 = {−1, −1, −1, 1, 1, 1} 3 replicates of δ1 4. δ4 = {−1, −1, −1, −1, 1, 1, 1, 1} 4 replicates of δ1 are examples of the exact design that matches this approximate design. For n ̸= 2 × q; q = 1, 2,... i.e. n not a multiple of 2, no exact design exists that matches this approximate design [equation (26)]. We could try designs that were close to this, e.g.: 1. n = 3, δa = {−1, −1, 1} or δa = {−1, 1, 1} 2. n = 5, δb = {−1, −1, −1, 1, 1} or δb = {−1, −1, 1, 1, 1} 3. n = 7, δc = {−1, −1, −1, −1, 1, 1, 1} or δc = {−1, −1, −1, 1, 1, 1, 1} 4. etc. Showing these are optimal requires an algorithm to search near the corresponding continuous design. In fact it can be shown, for this simple example, that matching the approximate design as closely as possible will achieve the result that is D-optimal with respect to exact designs. For more complicated designs we need a suitable algorithm to conduct this search. For designs that are exact versions of an approximate (or continuous) design we can show more easily if they are optimal. So far we have only considered optimal designs based on exact designs, i.e. ones which have integer values of design points. We will see below a method for identifying optimal designs, when we instead consider continuous designs. However, from the discussion above, when the exact design matches the approximate (continuous) design the same methods can be applied. Before considering this we first look at some other optimality criteria. 22 ST410 A.6.3 Criteria that aim to reduce the standard error of the predictions The previous section, A.6.1 illustrated some criteria based on the parameter estimates. For criteria that consider the variance of predicted values, we use the standard variance given in definition A.1 and equation (16). Denote the standardised variance by d(x, ξ), so that: Var(ŷ(x)) d(x, ξ) = n = nf ′ (x)(X ′ X)−1 f (x) (16a) σ2 Clearly from using a measure theory approach i.e. for continuous designs this becomes: Var(ŷ(x)) d(x, ξ) = n = nf ′ (x)M −1 (ξ)f (x) (16b) σ2 Definition A.8 (G-Optimal Design). A design is G-optimal if the maximum of d(x, ξ) over X is minimised. ¯ = maxx∈X d(x, ξ), then another way of stating this is that the design that Let d(ξ) ¯ is G-optimal. Denote this design by ξ ∗. minimises d(ξ) G ∗ 1. For continuous designs the optimum design measure for G-optimality ξG will also be D-optimal. ¯ ∗) = p 2. For continuous designs the optimum design measure d(ξG ¯ ∗) ≥ p 3. For exact designs the optimum design measure d(ξG Exercise A.3. Look back over Exercise A.1, treating these as exact designs that are coincident with continuous designs are any of these designs G-optimal? −1 0 1 δ1 = {−1, 0, 1}, →ξ= 1 1 1 , 3 3 13 −1 − 2 0 21 1 1 1 δ2 = {−1, − , 0, , 1}, → ξ = 1 1 1 1 1 , 2 2 5 5 5 5 5 −1 1 δ3 = {−1, 1}, →ξ= 1 1 , 2 2 1 1 1 1 −1 − 3 3 1 δ4 = {−1, − , , 1}, →ξ= 1 1 1 1 , 3 3 4 4 4 4 −1 −1 1 δ5 = {−1, −1, 1} →ξ= 1 1 1. 3 3 3 23 ST410 Solution A.3. We noted before that δ3 is uniformly the best design from the list in Exercise A.1. From the plot in figure 2, we note that d(x, ξ) is maximum at -1 and 1 - the design points. This is the lowest maximum and is equal to 2 = p so this design is G-optimal (design δ5 has a lower value at -1, but its maximum is at 1, and equals 3, so larger than the maximum for δ3 ). Note that this illustrates which exact designs are G-optimal, i.e. ones that match the criteria above and are the same as their counter-part continuous designs. Recall that such designs can be written as: x1 x2... x m ξ = s1 s2. n n... snm m P where si is the integer number of trials and si = n. For such designs the criteria we i=1 develop can be applied to the exact designs. Definition A.9 (V-Optimal/I-Optimal Design). A design is V-optimal or I-optimal if the average of d(x, ξ) over X is minimised. Note that some authors refer to V-optimal as I-optimal, Atkinson and Donev (1992) calls this V-optimal, whilst Montgomery (2021) refers to this as I-optimal. Example A.8. Consider the design for two factors x1 and x2 and δ3 = {(1, 1), (−1, 1), (1, −1), (−1, −1)} i.e. a 22 design. Assume the model to be fitted is the main effects model plus the two-way interaction (i.e. of the form y = β0 + β1 x1 + β2 x2 + β3 x1 x2 ). The design matrix will consist  not only of  columns for x1 an x2 but for the interaction: 1 1 1 1 1 −1 1 −1 X= 1 1 −1 −1,  1 −1 −1 1 Next we  have:     1 1 1 1 1 1 1 1 4 0 0 0  1 −1 1 −1 1 −1 1 −1 0     4 0 0 X ′X = 1 1 −1 −1 1 1 −1 −1 = 0 , 0 4 0 1 −1 −1 1 1 −1 −1 1 0 0 0 4 24 ST410   1/4 0 0 0  0 1/4 0 0  |X ′ X| = 256, (X ′ X)−1 =   0 . 0 1/4 0  0 0 0 1/4   1/4 0 0 0  0 1/4 0 0  |X ′ X| = 256, (X ′ X)−1 =   0 . 0 1/4 0  0 0 0 1/4    1/4 0 0 0 1  0 1/4 0 0    x1  =   Consequently: Var(ŷ(x)) = σ 2 1 x1 x2 x1 x2  0 0 1/4 0   x2  0 0 0 1/4 x1 x2 σ 2 4 1 + x21 + x22 + x21 x22 In this case n = 4 so the standardised variance: Var(ŷ(x)) × n/σ 2 = 1 + x21 + x22 + x21 x22. To find if it is V/I-optimal we need to integrate this over the design region and divide by the area of the design region. Clearly the area of the design region is 4 (a 2 × 2 grid.) So we have: 1 1 1 Z Z I= d(x, ξ)dx1 dx2 A −1 −1 1 1 1 Z Z = (1 + x21 + x22 + x21 x22 )dx1 dx2 4 −1 −1 16 =. 9 It turns out that this is the smallest possible value we can obtain for this model with four runs. So the design is I-optimal. A.6.4 Which Optimality criteria? Which design criteria to use will depend on the aim of the experiment. Clearly if we wish to obtain the most well defined parameter estimates then criteria such as D-optimal, A-optimal and E-Optimal designs are more important. If it is the prediction we are interested in, then for example G-optimal designs are more important. An obvious case where I/V-optimality is preferred would be a response surface design, where we are trying to find the settings to give a maximum (or minimum) response. In the search for this we need all predictions with the same level of variability. As we have seen above, and will see why below, D-optimal, designs have an equivalence to G-optimal designs. It is perhaps this that makes D-optimal designs the most widely 25 ST410 used. There are some natural extensions to these, where the optimality focuses on just certain factors. An obvious example of this is if we have blocks, we can develop the part of the model that removes blocks, just like we did in section 6 of the main notes.(The matrix theory where we “removed” the blocks using partitioned matrices). We then apply the optimality criteria to just the part of the design matrix for the factors that represent treatments. Similarly we may be interested in certain linear combinations of the parameters (Atkinson and Donev (1992) call this DA -optimality) or a subset of the factors (Ds -optimality, Atkinson and Donev (1992)). There are other extensions (Atkinson and Donev (1992) mentions c-optimality, C- and L-optimality as well.) However, because of the widespread use of D-optimality, and its often equivalence to G-optimality we will mainly focus on these two criteria. It should be noted that if we optimise a design using one criterion we are often very close (or even exactly) to an optimum design using another criterion. We will now look at an important theorem that allows us to determine the optimality (or not) of some designs. A.7 The General Equivalence Theorem We have hinted at the idea of a measure and also how this can be used to find optimum designs. An extended version of the general equivalence theorem, that holds for further optimality criteria can be found in Atkinson and Donev (1992), which we will examine below. First, however, we focus on one due to St. John and Draper (1975) which is specifically in terms of D-optimal and G-optimal designs and also follows the original authors of the theorem Kiefer and Wolfowitz (1960). Let ξ be a probability measure such that: ξ(x) ≥ 0, ∈ X (27) Z ξ(dx) = 1 (28) X If you are not familiar with measures, all this is stating is that when we integrate over the design space we obtain 1, which is equivalent to stating that in example A.6 or A.2 (solution A.2) or equation (25) the sum of the weights is 1. i.e. it is reaffirming that it is a probability measure. We can define a matrix analogous to the X ′ X matrix for a design ξ as: Z mij (ξ) = fi (x)fj (x)ξ(dx), i, j, = 1, 2,... , p (29) X (30) and M (ξ) = [mij ] (31) 26 ST410 For an exact design, nM (ξ) = X ′ X, so that M provides a continuous design generalisation of the X ′ X matrix for an exact design. Similarly we recall how we defined a generalisation of the standardised variance, (16) with equation (16b:). d(x, ξ) = f ′ (x)M −1 (ξ)f (x) (16b) Using these we may redefine D-optimal and G-optimal designs: Definition A.10 (D-optimal Continuous/approximate version of Definition A.3). ξ ∗ is D-optimal if and only if M (ξ ∗ ) is nonsingular and: max det (M (ξ)) = det (M (ξ ∗ )) (32) ξ Definition A.11 (G-optimal Continuous/approximate version of Definition A.8). ξ ∗ is G-optimal if and only if : min max d(x, ξ) = max d(x, ξ ∗ ) (33) ξ x∈X x∈X A sufficient condition for ξ ∗ to satisfy equation (33) is max d(x, ξ ∗ ) = p (34) x∈X Theorem A.1 (General Equivalence Theorem: For D-Optimal Designs). The three conditions above: max det (M (ξ)) = det (M (ξ ∗ )) (32) ξ min max d(x, ξ) = max d(x, ξ ∗ ) (33) ξ x∈X x∈X max d(x, ξ ∗ ) = p (34) x∈X are equivalent and the set B of all ξ satisfying these conditions is linear, M (ξ) is the same for all ξ ∈ B and also convex. As mentioned previously extensions of the original theorem by Kiefer and Wolfowitz (1960) have been developed that apply to other optimality criteria and Kiefer (1974) 27 ST410 includes a summary of these and further extensions. The course textbook for the advanced topics Atkinson and Donev (1992) also has the extensions to this theorem. In particular it has as the first criteria a general optimality criteria Ψ(M (ξ)) of which the determinant of equation (32) is a special case, and a directional derivative function ϕ(x, ξ) of Ψ which must be non-zero if the Ψ criteria is optimal. It turns out that for the determinant criteria (i.e. for D-optimality) this derivative function is p − d(x, ξ), leading to the combination of equations (33) and(34). It is also why there is an equivalence between D-optimal and G-optimal designs as the resulting derivative is the criteria for G-optimal designs as seen above in equations (32),(33) and (34). The third equivalence is then that the derivative achieves its minimum at the points of the design. The theorem in this form is given below. Theorem A.2 (General Equivalence Theorem: For criteria Ψ). The following expressions are equivalent, 1. the design ξ ∗ minimises Ψ(M (ξ)) (35) 2. the minimum of ϕ(x, ξ) ≥ 0 (36) 3. the derivative ϕ(x, ξ) achieves its minimum at the points of the design. (37) where, Ψ is a function of M (ξ) corresponding to an optimality criteria and, if ξ¯ is the measure that puts unit mass at the point x then 1 ¯ − Ψ{M (ξ)} ϕ(x, ξ) = lim Ψ{(1 − α)M (ξ) + αM (ξ)} (38) x→0 α ¯ the derivative of Ψ in the direction of ξ. (Atkinson and Donev, 1992, page 108) and (Atkinson et al., 2007, page 137) give tables of Ψ, ϕ and associated functions for several of the main optimality criteria. A.7.1 Application of the General Equivalence Theorem The General equivalence theorem is all well and good, but why is it important? Well it turns out that: 1. we can use it to determine if designs ate optimal for several criteria (e.g. D-optimal, A-optimal,E-optimal) 2. it tells us that approximate (continuous) G-optimal designs are also D-optimal (because in that case the directional derivative function gives the same as the G-optimal criteria.) 3. For exact designs it can provide a approximate (continuous) optimal design that can be used to start the search for an exact optimal design. 28 ST410 4. For approximate (continuous) designs, to determine if a design is D-optimal we need to ask if: (a) the maximum of d(x, ξ ∗ ) is p (the number of parameters in the model) (b) if this maximum occurs at the design points. (c) If the solution holds for ξ as multiplies of n1 it holds for the exact design. Example A.9. Consider again the designs in exercise A.1. To find if they are D-optimal; we only need to find the form of the standardised variance and then ask if the maximum is equal to p (here we are fitting a simple linear regression model so there are two parameters, the intercept and a term in front of the x, i.e. p = 2). We also need to ask if this maximum is at the design points. Although these are exact designs, if we regard them as examples of a continuous design where the measure is with obvious notation ξ(xi ) = n1 then we can use this to check if the designs are D-optimal (as we saw in exercises A.2 and A.3 and, also example A.6 ). We already found the standardised variances of these designs: δ1 = {−1, 0, 1} : standardised variance = 1 + 3x2 /2. 1 1 δ2 = {−1, − , 0, , 1} : standardised variance = 1 + 2x2. 2 2 δ3 = {−1, 1} : standardised variance = 1 + x2. 1 1 9 δ4 = {−1, − , , 1} : standardised variance = 1 + x2. 3 3 5 3 δ5 = {−1, −1, 1} : standardised variance = (3 + 2x + 3x2 ) 8 We can use the graph (Figure 2) of these to note that only δ3 has a maximum of 2 at the design points −1 and 1. Without plotting we note that in all cases X = [−1, 1] and: 1. δ1 = {−1, 0, 1}: standardised variance= 1 + 3x2 /2 which has a maximum in [−1, 1] at −1 and 1 where it has a value of 1 + 3(1)2 /2 = 1 + 3(−1)2 /2 = 1 + 3/2 = 2.5 Since 2.5 > 2 it is not D-optimal. 2. δ2 = {−1, − 21 , 0, 12 , 1}:standardised variance= 1 + 2x2 which has a maximum in [−1, 1] at −1 and 1 where it has a value of 1 + 2x2 = 1 + 2(−1)2 = 1 + 2(1)2 = 3 > 2 so is also not D-optimal. 3. δ3 = {−1, 1}:standardised variance= 1 + x2. Again this is maximum in [−1, 1] at −1 and 1 where it has a value of 1 + x2 = 1 + (−1)2 = 1 + (1)2 = 2 = 2. Because the maximum is p (2) at the design points it is D-optimal. 4. δ4 = {−1, − 31 , 13 , 1}: standardised variance= 1 + 95 x2. As before within X this is maximum at at −1 and 1 and the maximum will be 1 + 95 = 2 45 = 2.8 > 2 so again not D-optimal. 29 ST410 5. δ5 = {−1, −1, 1}: standardised variance = 38 (3 + 2x + 3x2 ). This will be maximum when x = 1 where it reaches = 83 (3 + 2 + 3) = 83 (8) = 3 > 2, so it is not D-optimal. Note, that we can apply the equivalence theorem here because we assume we are using the designs as presented i.e. they are both exact designs and continuous design. If we wanted to run the design δ3 over e.g. three runs, i.e. n=3, then this design remains D-optimal if half the runs are at -1 and half are at 1. Of course we cannot actually have 1.5 runs at each design point!, we need instead to use δ5 (or equivalentlyδ5∗ = {−1, 1, 1}) which is no longer D-optimal amongst the continuous designs, but may be the closest to this amongst the exact designs, so in that sense is D-optimal for exact designs. It turns out that is this the case, but we now also cannot assume that amongst exact designs it is also G-optimal. We can work out the ratio of determinants of the X ′ X for this exact design and that of the continuous design, and this gives us a degree of how close a design is to the continuous D-optimality (this is usually scaled by the dimension of the design). This ratio, and similar ones for other optimality criteria are often produced with software. Exercise A.4. Investigate if the following designs are D-optimal. In all cases assume that X : xi ∈ [1, 1]; i = 1,... , m. 1. For simple linear regression (y = β0 + β1 x) with designs: (a) δ1 = {−1, 1, −1, 1}. (b) δ2 = {−1, 1, 0, −1, 1}. 2. Consider the design for two factors x1 and x2 and δ3 = {(1, 1), (−1, 1), (1, −1), (−1, −1)}, (a) where the model to be fitted is just the main effects model (i.e. of the form y = β0 + β1 x1 + β2 x2 ); (b) where the model to be fitted is the main effects model plus the two-way interaction (i.e. of the form y = β0 + β1 x1 + β2 x2 + β3 x1 x2 ). 30 ST410 Solution A.4. 1. For simple linear regression (y = β0 + β1 x) we have that p = 2 and f ′ (x) = (1 x):   1 −1 1 1  (a) δ1 = {−1, 1, −1, 1} → X =  1 −1,  1 1   1 −1 ′ 1 1 1 1  1 1  4 0 XX=  = , |X ′ X| = 16, −1 1 −1 1 1 −1 0 4 1 1 1/4 0 (X ′ X)−1 =. 0 1/4 1/4 0 1 2 2 Var(ŷ(x)) = σ 1 x = σ 2 41 + x4. 0 1/4 x In this case n = 4 so standardised variance Var(ŷ(x)) × n/σ 2 = 1 + x2. This will be maximum when x = −1 or x = 1 where 1 + x2 = 2 = p. Hence the maximum of the standardised variance is 2 and this occurs at the design points -1 and 1. From the general equivalence theorem the design is D-optimal.   1 −1 1 1    (b) δ2 = {−1, 1, 0, −1, 1} → X =  1 0 ,  1 −1 1 1   1 −1 1 1  ′ 1 1 1 1 1  1 0  =  5 0 XX= , −1 1 0 −1 1  1 −1  0 4 1 1 1/5 0 |X ′ X| = 20, (X ′ X)−1 =. 0 1/4 1/5 0 1 2 2 Var(ŷ(x)) = σ 1 x = σ 2 51 + x4. 0 1/4 x In this case n = 5 so standardised variance Var(ŷ(x)) × n/σ 2 = 1 + 45 x2. This will be maximum when x = −1 or x = 1 where 1 + 54 x2 = 2 14 > p. Hence the maximum of the standardised variance is greater than 2. The design is not D-optimal. 31 ST410 2. For two factors x1 and x2 (a) where the model to be fitted is just the main effects model (i.e. of the form y = β0 + β1 x1 + β2 x2 ) with   1 1 1 1 −1 1  δ3 = {(1, 1), (−1, 1), (1, −1), (−1, −1)} → X =  1 1 −1,  1 −1 −1 We have:     1 1 1   1 1 1 1  4 0 0 1 −1 1  X ′ X = 1 −1 1 −1  1 1 −1 = 0 4 0 ,    1 1 −1 −1 0 0 4  1 −1 −1  1/4 0 0 |X ′ X| = 64, (X ′ X)−1 =  0 1/4 0 . 0 0 1/4 ′ The main effects only model → f (x) = (1 x1 x2 ) and p = 3.    1/4 0 0 1 2 x1  = σ4 1 + x21 + x22. 2 Var(ŷ(x)) = σ 1 x1 x2  0 1/4 0   0 0 1/4 x2 In this case n = 4 so standardised variance: Var(ŷ(x)) × n/σ 2 = 1 + x21 + x22. This will be maximum when x1 = ±1 and x2 = ±1 where 1 + x21 + x22 = 3 = p. Hence the maximum of the standardised variance is equal to 3 (p) and this occurs at the design points. The design is D-optimal. (b) Now the model to be fitted is the main effects model plus the two-way interaction (i.e. of the form y = β0 + β1 x1 + β2 x2 + β3 x1 x2 ). This follows example A.8, but we repeat it here. Although we have the same design points as before in exercise part 2a above, we are including the interaction so that the X design matrix will have an extra column which changes the X ′ X matrix. We still have n=4, but now f ′ (x) = (1 x1 x2 x1 x2 ) and p = 4. The design matrix is formed by multiplying the last two columns of the design matrixin the previous example  (2a) to obtain the column for the interaction: 1 1 1 1 1 −1 1 −1 X= 1 1 −1 −1,  1 −1 −1 1 Next we  have:     1 1 1 1 1 1 1 1 4 0 0 0  1 −1 1 −1 1 −1 1 −1 0     4 0 0 X ′X = 1 1 −1 −1 1 1 −1 −1 = 0 , 0 4 0 1 −1 −1 1 1 −1 −1 1 0 0 0 4 32 ST410   1/4 0 0 0  0 1/4 0 0  |X ′ X| = 256, (X ′ X)−1 =   0 . 0 1/4 0  0 0 0 1/4 Consequently:    1/4 0 0 0 1  0 1/4 0 0    x1  =   Var(ŷ(x)) = σ 2 1 x1 x2 x1 x2   0 0 1/4 0   x2  0 0 0 1/4 x1 x2 σ2 4 1 + x21 + x22 + x21 x22. In this case n = 4 so standardised variance: Var(ŷ(x)) × n/σ 2 = 1 + x21 + x22 + x21 x22. This will be maximum when x1 = ±1 and x2 = ±1 where 1 + x21 + x22 + x21 x22 = 4 = p. Hence the maximum of the standardised variance is equal to 4 (p) and this occurs at the design points. The design is D-optimal. Notice that in order to decide if the design is optimal we need to know not only the design points but also the form of the model being fitted. You may have noticed that the D-optimal design for simple regression and for a model with two factors and with main effects only, or with main effects and the interaction term are optimal if they have the design points equally shared between -1 and 1. This applies when we have more than two factors i.e. the 2k designs we have encountered earlier. In fact it can be shown that all 2k designs are D-optimal for models with main effects and two-way interactions. Once we include some squared terms this is no longer the case - obviously we need a central point to be able to fit a quadratic! (If we do, usually including some central points will make it G- and D-optimal.) This means that to create an optimal design we need to know something about the model. This can lead to using near optimal designs at some earlier stages i.e. when we are at the stage where the form of the model is unknown. We can also think about the context. If we consider response surface designs for example, we can use an optimal design for each stage of the process. When we are in the first stage and will move the design, we would use something like a 2k design becuase it is D- and G-optimal. Once we are near the optimum we might use a design with central points as well as that might be I/V-optimal! This is something we already did, but this theory might tell us that the number of central points should be the same as for the extreme points. 33 ST410 A.8 Algorithms for Designs In many situations where we have limited runs, we use exact design theory to find the optimum measure, and then algorithms to search for integer solutions. There are various packages in R that achieve this. Algorithms include: optFederov from the AlgDesign package Wheeler (2022a), Wheeler (2022b) Wheeler (2009). gen design from the skpr package Morgan-Wall and Khoury (2024a), Morgan-Wall and Khoury (2024b). The skpr package has a very useable GUI version which you launch with skprGUI(). A screenshot is given below in figure A.8 get optimality will display the optimality efficiency criteria (out of 100 compared to an optimal design) again from the from the skpr package. Lawson (2015) has several examples of using the optFederov package, including augmenting original fractional factorials. But try the GUI version of skpr, as it is reasonably intuitive and also has a tutorial. 34 ST410 Figure 6: Screen shot of the GUI interface to generate optimum designs using the skpr package. 35 ST410 References A. C. Atkinson and A. N. Donev. Optimum Experimental Designs. Number 8 in Oxford Statistical Science Series. Clarendon Press ; Oxford University Press, Oxford [England] : New York, 1992. ISBN 9780198522546;0198522541;. A. C. Atkinson, A. N. Donev, and R. D. Tobias. Optimum Experimental Design With SAS. Number 34 in Oxford statistical science series. Oxford University Press, Oxford, 2007. ISBN 978–0–19–929659–0; 978–0–19–929660–6. George E. P. Box and Norman R. Draper. Robust designs. Biometrika, 62(2):347–352, 1975. ISSN 0006-3444. doi: 10.1093/biomet/62.2.347. George E. P. Box and Norman Richard Draper. Empirical Model-Building and Response Surfaces. Wiley, New York;Chichester;, 1987. ISBN 0471810339;9780471810339;. J. Kiefer. General equivalence theory for optimum designs (approximate theory). The Annals of Statistics, 2(5):849–879, 1974. ISSN 00905364. URL http://www.jstor.org/stable/2958055. J. Kiefer and J. Wolfowitz. The equivalence of two extremum problems. Canadian Journal of Mathematics, 12:363–366, 1960. doi: 10.4153/CJM-1960-030-4. John Lawson. Design and Analysis of Experiments with R. CRC Press, Taylor & Francis Group, London;New York;Boca Raton;, 2015. ISBN 9781498728485;1498728480;. doi: 10.1201/b17883. Douglas C. Montgomery. Design and Analysis of Experiments. (EMEA) Wiley, Hoboken, NJ, tenth edition, 2021. ISBN 978-1-119-72210-6 978-1-119-49249-8 978-1-119-81695-9. Tyler Morgan-Wall and George Khoury. Package ‘skpr’, 2024a. URL https://cran.r-project.org/web/packages/skpr/skpr.pdf. Tyler Morgan-Wall and George Khoury. skpr, 2024b. URL https://tylermorganwall.github.io/skpr/. R. C. St. John and N. R. Draper. D-optimality for regression designs: A review. Technometrics, 17(1):15–23, 1975. Bob Wheeler. Using algdesign, 2009. URL https: //cran.r-project.org/web/packages/AlgDesign/vignettes/AlgDesign.pdf. Bob Wheeler. AlgDesign: Algorithmic Experimental Design, 2022a. URL https://CRAN.R-project.org/package=AlgDesign. Bob Wheeler. AlgDesign.pdf, 2022b. URL https://cran.r-project.org/web/packages/AlgDesign/AlgDesign.pdf. 36

Optimal Designs ST410 2023 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue