Statistical Analysis of Twin Data PDF
Document Details
Uploaded by AppealingAmazonite
null
Tags
Summary
This document discusses statistical analysis of twin data, focusing on quantitative genetics, variance decomposition to disentangle genetic and environmental influences on phenotypic traits. The document covers methods for estimating heritability and shared/non-shared environmental influences.
Full Transcript
Statistical Analysis of Twin Data 16 October 2023 15:03 Main Ideas Notes Quantitative Genetics Aims to disentangle the relative importance of genetic and environmental influences on complex phenotypic traits (e.g. height, personality traits, depression). Method: uses differences in genetic relatedne...
Statistical Analysis of Twin Data 16 October 2023 15:03 Main Ideas Notes Quantitative Genetics Aims to disentangle the relative importance of genetic and environmental influences on complex phenotypic traits (e.g. height, personality traits, depression). Method: uses differences in genetic relatedness bet between relatives and relate them to the similarity in the phenotypic traits Genetic relatedness in families: Notes Notes Decomposition of variance The total phenotypic variance VT = VA + VC + VE ○ h2 = a2 = VA / VT ○ c2 = VC / VT ○ e2 = VE / VT Another method to Falconer's rules ○ Assume that extent of shared environment is same for each twin ○ Polygenic Traits In quantitative genetics, most traits of interest are continuous: (e.g height, personality traits) But genes are categorical: ○ At one locus, we have 3 possible combinations of alleles, e.g. AA, AT, TT Path Diagram Conventions Scheme to represent variables and relationships between them Predicted Var-Cov Matrices Variance Measure of individual differences/dispersion around the mean Univariate twin model Covariance The extent to which deviations from the mean by variable X are similar to (or predict) those of Y Correlation Measure of similarity that is independent of the scale of the measure Path Tracing Rules Trace backward, then forward, or simply forward from one variable to another. Never forward then backward Loops are not allowed, i.e. we can not trace twice through the same variable There is a maximum of one curved arrow per path. ○ So, the double-headed arrow from the independent variable to itself is included, unless the chain includes another double-headed arrow Observed Correlation Matrices Twin correlations Contain the foundations to estimate heritability Components of similarity ○ Additive genetic influences = A ○ Shared (common) environment (e.g. SES, parenting, diet) = C Components of dissimilarity ○ Non-shared environment (e.g. accidents, differential parental treatment, and measurement error) = E Falconer's Rules Heritability (h2 or a2)= (rMZ-rDZ)*2 = (.80-.50)*2=.30*2 =.60 Shared environment (c2) = rMZ- a2 =.80 -.60 =.20 Non-shared environment = e2 = 1-rMZ = 1 -.80 =.20 PSYC0036 Genes and Behaviour Page 1 Estimation with Structural Equation Modelling (SEM) Principles 1. Put together a theoretical model of how the variables relate (with path tracing) 2. Derive predictions from this model (correlation or variance-covariance matrices) 3. Compare predictions to the observed relationships between variable with fit 4. functions 5. If the model fits well, then we conclude that this model is a plausible 6. description of the data Notes Model fitting and Assessing model fit Need to know if out theoretical model a good description of the data using an index of fit Chi-square statistic derived from the maximum likelihood estimation: ○ if the Chi-square is non-significant, it means that the model fits well. Significance depends on number of degrees of freedom ○ DF= number of observed statistics minus the number of parameters Significance of model parameters Can test the significance of a parameter by fixing it to zero An example of a parameter is path c which represents the shared environment Dropping a parameter in a model will make the model worse ○ But need to know if the model will get significantly worse Example: ○ The critical Chi-square value for 1 DF is 3.84. If the drop in Chi-square is less than 3.84, the parameter is said to be non-significant, i.e. it does not add much to the model. Example of reading model fitting results The ACE model fits well (non-sig p-value), predictions-observations match well. The best-fitting model is the AE sub-model. This is because C can be dropped (fixed to zero) without significant decline in fit. The A parameter is significant and could not be dropped (Chi-sq, 1 df = 5.18, is above the critical value of 3.86). Power: detecting effect of C In general, the statistical power to detect effects of C is very low. For example, the required sample size to detect, with 80% Power a heritability of 60% (with c2 = 10%) is just 62 MZ and DZ pairs To detect c2 of 10% with a h2 of 60% requires 2200 MZ and DZ pairs This poses a problem for small Twin studies If there is a certain proportion of C in the data, but the sample is underpowered to detect it, you will end up with the AE as the best model However, the C effects will then be estimated in the A effects, and will inflate the heritability estimate Example of power in parameter estimates Sample: 300 MZ and 300 DZ pairs: the statistical Power to detect C is around 35% instead of the preferred 80% level. To detect a C effect of.15 with 80% Power, we would need around 810 MZ and 810 DZ pairs. ○ If we drop C, all familial effects will go in A, so will see a very high Heritability of >70%. This is NOT a fair way of presenting the data, so the full ACE model + 95% CI needs to be reported PSYC0036 Genes and Behaviour Page 2