Population Genetics: A Concise Guide 2nd Edition PDF

"John Gillespie has done the near-impossible, condensing the essence of population genetics into a very short book. The result is a little gem. The derivations are simple and clear, and often strikingly original. The minor gaps in the first edition are filled by this equally concise second edition. Population genetics is a complicated subject; only a person of Gillespie's depth of knowledge and insight could simplify without distorting." -James F. Crow, author of Genetics Notes John H. Gillespie THE JOHNS HOPKINS UNIVERSITY PRESS Baltimore & London www.press.jhu.edu COVER ILLUSTRATION:© Mark A. Klingler COVER DESIGN: Wilma Moritz Rosenberger -Population Genetics Population Genetics A Concise Guide Second Edition John H. Gillespie THE JOHNS HOPKINS UNIVERSITY PRESS Baltimore and London @1998, 2004 The Johns Hopkins University Press All rights reserved. Published 2004 Printed in the United States of America on acid-free paper 9 8 7 6 5 4 3 2 1 The Johns Hopkins University Press 2715 North Charles Street Baltimore, Maryland 21218-4363 www.press.jhu.edu Library of Congress Cataloging-in-Publication Data Gillespie, John H. Population genetics : a concise guide / John H. Gillespie.- 2nd ed. p. em. Includes bibliographical references and index. ISBN 0-8018-8008-4 (alk. paper) -ISBN 0-8018-8009-2 (pbk. alk. paper) 1. Population genetics. I. Title. QH455.G565 2004 576. 5'8-dc22 2004043815 Title page illustration: Orosophilidae 02004 Mark A. Klingler To Robin Gordon Contents List of Figures ix Preface xi 1 Genetic Variation 1 1. 1 DNA variation i n Drosophila 2 1. 2 Loci and alleles........ 6 1.3 Genotype and allele frequencies 10 1.4 The Hardy-Weinberg law 12 1. 5 Answers to problems 18 2 Genetic Drift 21 2.1 A first look at genetic drift 22 2.2 The decay of heterozygosity 25 2.3 Mutation and drift. 29 2.4 Molecular evolution. 32 2.5 The neutral theory. 36 2.6 The coalescent... 40 2. 7 The effective size of a population 47 2.8 Another model of genetic drift 49 2.9 The stationary distribution... 53 2. 10 Is genetic drift important in evolution?. 55 2.11 Answers to problems........... 56 3 Natural Selection 59 3.1 The fundamental model 61 3.2 Relative fitness..... 62 3.3 Three kinds of selection 64 3.4 Mutation-selection balance 70 3.5 Genetic load......... 71 3.6 The heterozygous effects of alleles. 76 3. 7 Changing environments.. 85 3.8 The stationary distribution 90 3.9 Selection and drift..... 91 vii viii Contents 3.10 Molecular evolution. 95 3. 1 1 Answers to problems 98 4 Two-Locus Dynamics 101 4.1 Linkage disequilibrium 101 4.2 Two-locus selection. 105 4.3 Genetic draft.... 111 4.4 Answers to problems 116. 5 Nonrandom Mating 1 19 5.1 Generalized Hardy-Weinberg 1 20 5.2 Identity by descent... 121 5.3 Inbreeding........ 123 5.4 The evolution of selfing 127 5.5 Subdivision..... 131 5.6 Answers to problems 136 6 Quantitative Genetics 1 39 6.1 Correlation between relatives 139 6.2 Response to selection..... 150 6.3 Evolutionary quantitative genetics 154 6.4 Dominance........ 160 6.5 The intensity of selection 166 6.6 Answers to problems... 168 7 The Evolutionary Advantage of Sex 169 7.1 Genetic segregation. 170 7.2 Crossing-over..... 1 73 7.3 Muller's ratchet.... 174 7.4 Kondrashov's hatchet 179 7.5 Answers to problems. 183 Appendix A Mathematical Necessities 185 Appendix B Probability 189 Bibliography 207 Index 211 Figures 1. 1 The ADH coding sequence. 3 1.2 Two ADH sequences... 7 1.3 Differences between alleles 8 1.4 Protein heterozygosities 17 2.1 Simulation of genetic drift 23 2.2 Drift with N = 1..... 24 2.3 The derivation of Q'... 26 2.4 Substitutions on a lineage 34 2.5 Substitution processes 35 2.6 Hemoglobin evolution.. 37 2.7 A coalescent........ 41 2.8 Two improbable coalescents 44 2.9 Beta density......... 55 3.1 The medionigra allele in Panaxia 60 3.2 A simple life cycle. 61 3.3 Directional selection 65 3.4 Balancing selection. 67 3.5 Fitness and epistasis 74 3.6 Hidden variation crosses 77 3. 7 Drosophila viability 78 3.8 A typical Greenberg and Crow locus 80 3.9 A model of dominance 83 3.10 Fixation probabilities. 93 3.11 Rates of substitution 96 4.1 Two loci....... 102 4.2 Linkage disequilibrium in Drosophila 104 4.3 Heterozygosity and recombination 106 4.4 Heterozygosity with hitchhiking... 110 4.5 Hitchhiking alleles.......... 112 4.6 The relationship between the population size and the effective population size under genetic draft.............. 114 ix X Figures 5.1 Coefficient of kinship 121 5.2 Shared alleles.... 122 5.3 Effects of inbreeding 124 5.4 Evolution of selfing. 128 5.5 Inbreeding trajectory. 131 5.6 The island model... 134 6.1 The height of evolution students 140 6.2 Quantitative genetics model... 141 6.3 Regression of Yon X...... 148 6.4 A selective breeding experiment. 150 6.5 The response to selection... 152 6.6 The selection intensity..... 153 6. 7 Selection of different intensities 155 6.8 Additive and dominance effects 161 7.1 Sex versus parthenogenesis 170 7.2 Evolution in parthenogens. 171 7.3 Asexual directional selection. 173 7.4 Muller's ratchet........ 175 7.5 Simulation of Muller's ratchet. 176 7.6 Recombination......... 178 7.7 Synergistic epistasis...... 180 7.8 Asexual mutation distribution. 181 Preface At various times I have taught population genetics in two- to five-week chunks. This is precious little time in which to teach a subject, like population genetics, that stands quite apart from the rest of biology in the way that it makes scientific progress. As there are no textbooks short enough for these chunks, I wrote a Minimalist's Guide to Population Genetics. In this 21-page guide I attempted to distill population genetics down to its essence. This guide was, for me, a central canon of the theoretical side of the field. The minimalist approach of the guide has been retained in this, its expanded incarnation. My goal has been to focus on that part of population genetics that is central and incontrovertible. I feel strongly that a student who understands well the core of population genetics is much better equipped to understand evolution than is one who understands less well each of a greater number of topics. If this book is mastered, then the rest of population genetics should be approachable. Population genetics is concerned with the genetic basis of evolution. It differs from much of biology in that its important insights are theoretical rather than observational or experimental. It could hardly be otherwise. The objects of study are primarily the frequencies and fitnesses of genotypes in natural populations. Evolution is the change in the frequencies of genotypes through time, perhaps due to their differences in fitness. While genotype frequencies are easily measured, their change is not. The time scale of change of most naturally occurring genetic variants is very long, probably on the order of tens of thousands to millions of years. Changes this slow are impossible to observe directly. Fitness differences between genotypes, which may be responsible for some of the frequency changes, are so extraordinarily small, probably less than 0.01 percent, that they too are impossible to measure directly. Although we can observe the state of a population, there really is no way to explore directly the evolution of a population. Rather pmgress is made in populat ior). g n ti · by constrncting mathemati cal models of evol u tion , studying th i:r behavior, and then ch eking whether the states of populations are compatible with this behavior. Early in h history of po p u lation genetics, certain models exhibited dynamics that were f such obvi ous univer. al imp rtauce that th fact th at they could not b dir- tly verified in a natural s tting , em cl unimportant. There is no better example than gert- ic d.ri£L t,he mall random changes in g u typ fr que:ncies caused by variation in offspring number between inclivicluals and, in diploids, genetic segregation. Gc- xi XII Preface netic drift is known to operate on a time scale that is proportional to the size of the population. In a species with a million individuals, it takes roughly a million generations for genetic drift to change allele frequencies appreciably. There is no conceivable way of verifying that genetic drift changes allele frequencies in most natural populations. Our understanding that it does is entirely theoretical. Most population geneticists not only are comfortable with this state of affairs but also revel in the fact that they can demonstrate on the back of an envelope, rather than in the laboratory, how a significant evolutionary force operates. As most of the important insights of population genetics came initially from theory, so too is this text driven by theory. Although many of the chapters begin with an observation that sets the biological context for what follows, the signif icant concepts first appear as ideas about how evolution ought to proceed when certain assumptions are met. Only after the theoretical ideas are in hand does the text focus on the application of the theory to an issue raised by experiments or observations. The discussions of many of these issues are based on particular papers from the literature. I chose to use papers rather than my own summary of several papers to involve the reader as quickly as possible with the original literature. When I teach this material, I require that both graduate and undergraduate students actually read the papers. Although this book describes many of the papers in detail, a deep understanding can only come from a direct reading. Below is a list of the papers in the order that they appear in the text. I encourage instructors to make the papers available to their students. 1. CLAYTON, G. A. , MORRIS, J. A. , AND RoBERTSON, A. 1957. An exper imental check on quantitative genetical theory. II. Short-term responses to selection. J. Genetics 55:131-151. 2. CLAYTON, G. A., AND ROBERTSON, A. 1955. Mutation and quantitative variation. Amer. Natur. 89: 151-158. 3. GREENBERG, R. , AND CROW, J. F. 1960. A comparison of the effect of lethal and detrimental chromosomes from Drosophila populations. Ge netics 45: 1 153-1168. 4. HARRIS, H. 1966. Enzyme polymorphisms in man. Proc. Roy. Soc. Ser. B 164:298-310. 5. KIMURA, M. , AND 0HTA, T. 1971. Protein polymorphism as a phase of molecular evolution. Nature 229:467-469. 6. KIRKPATRICK, M. , AND JENKINS, C. D. 1989. Genetic segregation and the maintenance of sexual reproduction. Nature 339:300-301. 7. KONDRASHOV, A. 1988. Deleterious mutations and the evolution o f sexual reproduction. Nature 336:435-440. 8. KREITMAN, M. 1983. Nucleotide polymorphism at the alcohol dehydro genase locus of Drosophila melanogaster. Nature 304:412-417. Preface XIII 9. MORTON, N. E. , CROW, J. F. , AND MULLER, H. J. 1956. An estimate of the mutational damage in man from data on consanguineous marriages. Proc. Natl. A cad. Sci. USA 42:855-863. Each chapter contains a short overview of what is to follow, but these overviews are sometimes incomprehensible until the chapter has been read and understood. The reader should return to the overview after mastering the chap ter and enjoy the experience of understanding what was previously mysterious. Each chapter of the text builds on the previous ones. A few sections contain more advanced material, which is not used in the rest of the book and could be skipped on a first reading; these are sections 2.6, 2.8, 2.9 3.8, 6.4, and 6.5. Certain formulae are placed in boxes. These are those special formulae that play such a central role in population genetics that they almost define the way most of us think about evolution. Everyone reading this book should make the boxed equations part of their being. Problems have been placed within the text at appropriate spots. Some are meant to illuminate or reinforce what came before. Others let the reader explore some new ideas. Answers to all but the most straightforward problems are given at the end of each chapter. The prerequisites for this text include Mendelian geneti cs a smattering of molecular genetics, a facility with simple algebra and a fum grasp of elementary probability theory. The appendices contain most of \Vbat is needed in the way of mathematics, but there is no introduction to genetics. With so many good genetics texts available at all levels, it seemed silly to provide a cursory overview. Many people have made significant contributions to this book. Among the students who suffered through earlier drafts I would like to single out Suzanne Pass, wb.o gave me pages of very detailed comments that helped me find clearer ways of presenting som of the material and gave me some understanding of how the book selhl to a bright undergraduate. Dave Cutler was my graduate teaching assistant for a 10-week undergraduate course based on an early draft. In addition to many invaluable comments, Dave also wrote superb answers to many of the problems. Other students who provided helpful comments includ d Jo 1 Kniskern, Troy Thorup, Jessica Logan, Lynn Adler, Erik Nelson and Car line Christian. I regret that the names of a few others may b.ave disappeared in the clutter on my desk. You have my thanks anyway. huck Langley taught a fiv -week graduat our ut f the penultimate rn:aft. H n t only found many rrors and ambiguities but also made the ge- 11etics much m re precise. Mel reen helped in th sam way after a thorough ' r ading from ·ov r L ·over (no bad for a man who looks on most of popula tion gen tics w ith skepticism! ) Michael Turelli answered innumerable questions. a.b uL quantitative genetics i n cluding the on· whos answer I haLed: Is his how you would teach quantitative genetics? Monty Slatkin made many helpful suggestions based on a very early version. David Foot- provided th data for Figure 6. 1. xiv Preface Finally, my greatest debt is to my wife, Robin Gordon, who not only encour aged me during the writing of this book but also edited the entire manuscript. More important, she has always been my model of what a teacher should be. Whatever success I may have had in teaching population genetics has been in spired in no small part by her. In keeping with the tradition established in my previous book of dedications to great teachers, I dedicate this one to her. The second edition The new edition of the Guide retains the minimalist spirit of the previous edition even though it has grown by about 20 percent. The most important additions are the introduction of more material on stochastic processes in evolution, a new section on genetic load theory, and a new chapter on two-locus theory. The sections on effective population size and selection in a changing environment have been completely rewritten. The new material is more challenging than the old, which is fitting as population genetics is being called upon to interpret the data pouring out of genome-sequencing centers. The new edition has benefited from suggestions by many people throughout the world. I would like to single out and thank Dave Cutler, Bill Gilliland, Dan Gusfield, Ralph Haygood, Susan Hodge, Masaru Iizuka, Andy Kern, Chuck Langley and the students in the Population Biology Graduate Group at UC Davis. Many others have contributed as well, and I thank you all. Robin Gordon has, once again, improved the final product with her constant support and unmatched editing skills. Population Genetics Chapter 1 Genetic Variation Population geneticists spend most of their time doing one of two things: de scribing the genetic structure of populations or theorizing on the evolutionary forces acting on populations. On a good day, these two activities mesh and true insights emerge. In this chapter, we will do all of the above. The first part of the chapter documents the nature of genetic variation at the molecular level, stressing the important point that the variation between individuals within a species is similar to that found between species. After a short terminologic di gression, we begin the theory with the traditional starting point of population genetics, the Hardy-Weinberg law, which describes the consequences of random mating on allele and genotype frequencies. Finally, we see that the genotypes at a particular locus do fit the Hardy-Weinberg expectations and conclude that the population mates randomly. No one knows the genetic structure of any species. Such knowledge would require a complete description of the genome and spatial location of every indi vidual at one instant in time. In the next instant, the description would change as new individuals are born, others die, and most move, while their transmitted genes mutate and recombine. How, then, are we to proceed with a scientific investigation of evolutionary genetics when we cannot describe that which in terests us the most? Population geneticists have achieved remarkable success by choosing to ignore the complexities of real populations and focusing on the evolution of one or a few loci at a time in a population that is assumed to mate at random or, if subdivided, to have a simple migration pattern. The success of this approach, which is seen in both theoretical and experimental investigations, has been impressive, as I hope the reader will agree by the end of this book. The approach is not without its detractors. Years ago, Ernst Mayr mocked this approach as "bean bag genetics." In so doing, he echoed a view held by many of the pioneers of our field that natural selection acts on highly interac tive coadapted genomes whose evolution cannot be understood by considering the evolution of a few loci in isolation from all others. Although genomes are certainly coadapted, there is precious little evidence that there are strong inter actions between most polymorphic alleles in natural populations. The modern 1 2 Genetic Variation view, spurred on by the rush of DNA sequence data, is that we can profitably study loci in isolation. This chapt r b gins with a des ripti n of g tn ti variati n at the alcohol dehydrogenase locus, ADH, in Dmsophil.a. ADIJ is bn oll locus in ou sp cies. Y L, iif! g netic variation is typical in m sL regards. ther lo i in Dm ophila and in ot her sp i s may differ ctuantitatively buL uo in their gros. featur s. 1.1 D N A variation i n D rosophila Although p pulatlon gen tic is ·on med mainly \vitb geneti · variation within species , until rc n:tl only g n ti variation wjt.h major morphological manifes tations, suc h as visi bl e 1 thal r chrom. somal mutaUons, c: tlld b analyz d gen tically. · The bulk of genet;i. ally bas d variati n was r fl'a ·tory to the most s nsi tiv of experimental protocols. Variati u was lm wn to exist b ·ause of h · uniformly high heritabilities of quantitative traits; there was simply no way to dissect it. Today all this has chang d. With readily availall p lymerase chain reac tion (P R) kits, the appropriat vrim.rs and a sequenciug machine, even the uninit iat d an soon obtaiu DNA sequenc s from s 1 , the allele frequency piles up near its deterministic equilibrium value of one-half. In the former case genetic drift dominates mutation; in the latter case mutation dominates drift. When 4Neu = 1 , we have the peculiar case where neither force has the upper hand. Problem 2. 1 7What are the mean and variance of the beta density derived in the previous paragraph when u = v ? The answer to Problem 2. 1 7 may be used to find the expected sum of site heterozygosities under the infinite-sites, no-recombination model of the gene. The xpected heterozygosity for a single site with symmetric reversible mutation is where 88 = 4Nu8 and u8 is the mutation rate at one nucleotide site. To convert this to a site in an infinite-sites model, we must fix the locus mutation rate at e and assume at first that there are m rather than an infinity of sites. As the locus mutation rate is the sum of the site mutation rates, 88 = Bjm. The expected sum of site heterozygosities is 2. 10 Is genetic drift im portant in evolution? 55 4 3 1 0.2 0.4 0.6 0.8 1 p Figure 2. 9: The beta density for three values of 4N.u. The limit as the number of sites goes off to infinity (m ----> oo ) is (2.25) This fact was used in the discussion of Tajima's D on page 45. 2.10 Is genetic drift important i n evol ution? Genetic drift is an evolutionary force that changes both allele and genotype frequencies. No population can escape its influence. Yet it is a very weak evolutionary force in large populations, prompting a great deal of debate over its relative importance in evolution. Historically, this debate has covered a great deal of territory. Here, we can take a very narrow view by asking the question: Which evolutionary force has the largest impact on the variance in the change of an allele's frequency? We have seen that drift makes Var {p' } inversely proportional to the population size, which can be very small in large populations. Other stochastic forces may well have a much larger effect on the variance of allele frequencies. In subsequent chapters we will find Var{p' } for two other stochastic forces: genetic draft and selection in a random environment. Both forces have the potential to be more important than drift for the dynamics of common alleles. Our judgment must wait until we examine these forces. Rare alleles, such as new mutations, are greatly affected by demographic sto chasticity. We saw, for example, that the probability that a new mutation is lost in a single generation is about one-third. Very rare alleles, those with ten 56 Genetic Drift or so copies, bounce around on a time scale of a few generations. These fluc tuations are due to the same root cause as those of genetic drift, demographic stochasticity, but it is not clear that we should call these stochastic dynamics genetic drift as it obscures the profound difference in time scales. To help make this point, consider that the variance in p' for a new mutation is inversely' pro portional to N-2 , yet the numbers of this mutation jump around significantly every generation. Genetic drift is best thought of as a stochastic process de scribing the dynamics of alleles with frequencies much greater than 1/ N. We have no satisfactory name for the stochastic process representing the dynamics of very rare alleles. Here we simply call it the boundary process, a neutral term that can encompass not only fluctuations caused by demographic stochasticity, but by various other stochastic and deterministic forces as well. The boundary process is arguably the most important stochastic process in evolution. It will be explored in more detail in the next chapter, where natural selection makes a crisper distinction between the boundary process and other processes than we have when demographic stochasticity is the only evolutionary force at work. 2.11 Answers to problems 2.1 The probability that a particular allele is not chosen on a single draw is 1 - 1/(2N). As each draw is with replacement, the probability that the allele is not drawn at all is [1 - 1/(2N)]2N. For large populations, this probability approaches e- 1 0.37, using the hint in the statement of the problem. 2.2 Here is a simulation written in the Python programming language. You can download Python and find many tutorials at www.python.org. import random twoN , p = 40 , 0.2 print 0 , p f o r generat ion in range ( 1 , 100) : numberDf A 1 = 0 for t r i al in range ( O , twoN) : if random ( ) < p: numberDfA1 += 1 p = f l o at (numberDf A 1 ) I twoN print generat ion , p 2.3 One approach is to simulate the process by flipping a coin and designat ing heads as the event that the individual in the next generation is a heterozygote and tails as the probability that it is a homozygote. The average number of flips until a tail appears is the same as the average number of generations until the population becomes homozygous. Alter natively, you can notice that the time to homozygosity is a geometric random variable anq use the properties of geometric random variables as given in Appendix B to obtain the answer. 2. 1 1 Answers to problems 57 2.6 If sampling is with replacement, then the probability of choosing allele Ai on two successive draws from a given population is PT. Thus, the probability of choosing two alleles that are identical by state is I; PT , which is the homozygosity of the population, G, as defined in Equation 1.3. G can be described in an entirely different way. Two alleles sampled with replacement will be identical by state if they are identical by origin, which occurs with probability 1/2N. The probability that two alleles that are different by origin are identical by state is Q. Thus, 2.7 The number of mutations that are one step away is obtained by noting that there are 3000 sites where a mutation can occur and three nucleotides that can replace the original, giving 3000 x 3 9000 different mutations. For = the number that are two steps away, note that there are 3000 sites for the first mutation and 2999 for the second mutation, so 3000 X 2999 X 3 X 3 = 80,973,000 mutations are two steps away. 2.13 The allele frequencies at the 1 1 segregating sites in Figure 1. 1 are 4/1 1 2/11 2/1 1 , etc., which leads to fr 6.03636. = 2. 14 Using fr 6.03636 from the previous problem, iJ 4.78 from page 43, and = = a tortuous calculation of C 1.08461, we get Dr 1. 1585. This does not = = allow a rejection of the null hypothesis because Dr < 2. 2.16 Because E{Jx } = E{ilx+y } , we have 2.17 Using the formulae for the moments on page 199, the mean is 1/2 and the variance is 1 4(2u + 1) ' Chapter 3 N atural Selection Tb r' ults f natural selection t he evol utionary fore most resp ns i ble for adapt ation o th nvironment, are evident it is r ma.rkabl y verywhere yet cliffi u l t L observe the tim ourse of changes brought ab u by sel - t iou. Th reason of · urse is tha m st voluti nary change is ex t raordi nari ly sl w. Sig nificant. ha.nges in the freq u ncies of genotyp s take longer t h an the lifetim of a human bse r ver. This temporal imbalan is the greatest obstacle to the study of volut ion and is th w.ai.n r as 1.1 why much < f ur uncL rstanding )f volutionary process s comes from t heor ticaL am] wathematical argum nt.s rather than direct observation, as is typical in other areas of biology. Occasionally, we are able to observe natural selection in action either be cause the strength of selection is so great that change occurs very quickly or because th organ ism, 1 erh ps 'l bact ria, or viru , has a very short eneration tirn. The Eur peru scarlet liiger m lth, Panaa.ia domimJ-la, provi des one wel l- udied example. In a populati n just ut si d f xfOl'd, Englaud, an ali I that reduces the spotting on the for wiug t h medionigm all - 1-, is found in fairly high fr qn n y. As tills allele is found n wh re h;e, i t has atLr ted atten tion from but t r.By nthusiasts. The frequency of th medionigro. allel d clin u fairly st ailily froru 1 93 · unt i l 1 9r:5 aft er which it be an hopping ar nnd errat i ·aJJy, as ill ustrat d in F ig ure 3.l. Although the c;ompletc rec ord is difficulL t: intcrp r t, th 1 riocl of t· ady dedin appears t b a case of nat mal selection pre£ rring the mm n a llele over th m. dionigr-a all I. If so, how st ro ng is the selection? tbor question me t mind as well. Why is be mrtlionigra. allele less fit? If it is less fit, how did it get to a frequency of 10 percent before begin ning its declin. While we ·will not be able to provide complete answers to any of these questions, w w i l l b able to discuss them much more intelligently after a theoretical investigation f the nature and consequences of natural selection. In this cha1 t r we wi l l dis over how natural sel cti n changes allele frequen · cies by examining s m on.e-lo us 111 I ls of sel ' ·ti n. Natural s l c:tion works when ge notypes have different fitness ::;. 'I1 a g te t i isL, :fitness is just anoth trait with a genetic component. To an e vol ut ionist , i i.. th ul timate trait b caus it is h n upon which n atu ral selecti n a ls. Fitues. is a ·ornpli ated 59 60 Natural Selection 0.12 0. 1 0 >.· u 0.08 ::I c:r.§ 0.06 0) d) 0.04 0.02 0.00 1940 1950 1960 1970 Year Figure 3.1: The observed frequency of the medionigra allele in the scarlet tiger moth population compared to the expected frequency assuming a 10 percent disadvantage. trait, even in the context of a simple one-locus, two-allele model. There is in dividual fitness, genotype fitness, relative fitness, and absolute fitness. We will spend some time making these different aspects of fitness clear before tackling the problem of the dynamics of natural selection. An examination of the dynamics of natural selection quickly leads to the con clusion that the dominance relationships between alleles affecting fitness have a profound affect on the outcome of selection. Fortunately, the dominance of fitness alleles can be investigated experimentally; in Section 3.6 a study of via bility in Drosophila melanogaster populations is described. A major conclusion of this study is that there is an inverse homozygous-heterozygous effect for dele terious alleles: alleles that have large deleterious effects when homozygous tend to be nearly recessive, whereas alleles with small homozygous effects tend to be nearly additive. A casualty of the study is overdominance, the form of domi nance that is often invoked to explain selected polymorphisms. However, the subsequent section shows that selection in a variable environment can promote polymorphism even when heterozygotes are intermediate in fitness. Section 3.9 examines the interaction of genetic drift and selection. Genetic drift has a major influence on the fate of rare alleles even in very large popula tions. In fact, the sad fate of most advantageous mutations is extinction, which leads to the view that evolution is fundamentally a random process that is not repeatable or reversible. The chapter ends by revisiting molecular evolution, which is examined in the much richer context of selection, drift and mutation rather than just the latter two forces. This is an ambitious chapter, with many more new topics than were in the preceding chapters. It is also more difficult than the previous material because the mathematics of selection lacks the elegance and simplicity of the mathematics of drift and mutation. 3.1 The fundamental model 61 r - - - - - - - - - - - - - - - - , I I I Newborns Selection Adults I p I I p' p p' I I L - - - - - - - - - - - - - - - - J One generation Figure 3. 2 : The simple life cycle used in the fundamental model of selection. 3.1 The fundamental model Natural selection is most easily studied in the context of an autosomal locus in a hermaphroditic species whose life cycle moves through a synchronous cycle of random mating, selection, random mating, selection, and so forth. Our entrance into the cycle is with newborns produced just after a round of random mating by their parents. Figure 3.2 shows that the frequency of the A1 allele among the newborns is called p, which is the same as the allele frequency in their parents. As their parents mated at random, the genotype frequencies of the newborns will conform to Hardy-Weinberg expectations. The newborns must survive to adulthood in order to reproduce. The prob ability of survival of an individual will, in general, depend on the genotype of the individual. Let the probabilities of survival or, as they are more usually called, the viabilities of A 1 A 1 , A 1 A2 , and A2 A2 individuals be wn , w 12, and w22 , respectively. Viabilities may be thought of as either probabilities of sur vival of individuals or the fraction of individuals that survive. The latter allows us to see immediately the consequences of selection because the frequency of a genotype after selection is proportional to its frequency before selection times its viability, or frequency after selection ex newborn-frequency x viability. For example, the frequency of A 1 A 1 in the adults is proportional to p2 w 11. To obtain the relative frequencies of the three genotypes in the adults, we must find a constant of proportionality such that the sum of the three genotype frequencies in the adults is one. The following worksheet shows how this is done. Genotype: A 1 A1 A 1 A2 A2 A2 Frequency in newborns: p2 2pq q2 Viability: wn W 12 W22 Frequency after selection: p2 wn I w- 2pqw12 /w 2 ; Q W22 W. - The constant of proportionality, iJJ = p2 wn + 2pqw1 2 + q2 w22 , is chosen such that 62 Natural Selectio n as required. The quantity w has special meaning in population genetics. It is called the mean fitness of the population. (If the concept of a mean is unfamiliar, read from the beginning of Appendix B through page 191.) After selection, the frequency of the A1 allele may have changed. The new allele frequency, p' , is (Don't forget that each heterozygote has only one A 1 allele.) The change in the frequency of the A1 allele in a single generation, 6.8p = p' - p, follows from p2 wn + pqw12 - pw p' - p = - w 2 = p [pqw n + q(1 - 2p)w1 2 - q w22l which simplifies to llsP pq[p(wn - w 1 2 ) + q(w 1 2 - W 22 ) ] (3. 1) p2 wn + 2pqw 1 2 + q2 w22 = This is probably the single most important equation in all of population genetics and evolution! Admittedly, it isn't pretty, being a ratio of two polynomials with three parameters each. Yet, with a little poking around, this equation easily reveals a great deal of the dynamics of natural selection. Problem 3. 1 In 1 940, the frequency of the medionigra allele in the Oxford population was about p = 0.1. If the viabilities of the three genotypes were w 1 1 = 0.9, w12 0.95, and w22 1, what would be the frequency of medionigra = = in the newborns of 1 941 ? 3.2 Relative fitness Notice that the terms in the numerator and denominator of Equation 3. 1 all have a viability as a factor. Thus, if we were to divide the numerator and denominator by a viability, say by w1 1 , every viability in Equation 3.1 would become a ratio of that viability and w 1 1 , yet the numerical value of tlp would not change at all. In other words, we could use as our definition of viability either the original definition based on absolute viabilities or a new one based on the relative viabilities of genotypes when compared to one particular genotype. Genotype: A1 A1 Viability wn Relative viability: 1 In either case, the dynamics of selection as captured in Equation 3.1 are the same. An important insight in its own right, this is also of great utility as it 3.2 Relative fitness 63 allows a much more informative choice of parameters than the Wij 's used thus f;.r. Up to tlli 1 oint Wf.j has been called th viu.bility f g n typ Ai A3. M r oft en, W·i,i is c.1.ll.d t h fi tness , o r some t imes the a l so1ut or Darwinian fi tness , r g notyp A1Aj. In natur , th fitness of a g notyp has many compon ut:: i n c lud ing viability £. rti li ty, d · v I pmental time, mating success, and so forth. M st of th s mp n nt , th r than viabil ity, a na t l n included in a simpl 1uodel l i ke that d flu d by Eq nati n 3. 1. Yi t if th · differences in fi t nesses b 1;w en genotyp s are small uat;ion 3. I is t good approximati n to the actual clymlJili as l ngo as the values of th Wij are hos n appr pri ately. Here we will n t inv stig te t hese mor complicated m lets but will trom Lhis point on re fer t Wij aR a. fHn ss and al l ow it to take on any valu s grea.t r than or qual · to zero. As Lh dyllami ·s of s I· ti o n depend n relativ fitness ·, not lung really o cha nge · by allow ing this broa.doo t ! sc p · for 'Wf.j. A comm n notaliioual nv nti n for r l ati v fitnesscs is Genotype: Relative fitness: wh - re 1 - hs w12/111u and 1 - s = W22 / 'wu. = The param ter :; is called t.he sel ti u 1) , disappear from the population (p --> 0), approach some intermediate value (p --> p) , or not change at all? As we shall see, all four outcomes are possible. Which one prevails depends on the dominance relationships between alleles and on the initial frequency of the allele. 3.3 Three kinds of selection 65 0.01 0... 0.5 0.00 '-------'---' 0.0 1 ). A graph of b.. 8p versus p in this case shows that p will decrease when rare and increase when near one. (You will be asked to draw this graph in Problem 3.7) That's strange: The outcome of selection depends on the initial frequency of the allele! In fact, the allele frequency will approach zero if the initial value of p is less than p, where p is given by Equation 3.4. The allele frequency will approach one if the initial value of p is greater than p. If, by some bizarre chance, p = p, the allele frequency will not change at all. p is an unstable equilibrium because the smallest change in p will cause the allele frequency to move away from p. A small change in p might well be caused by genetic drift. Problem 3. 7 Graph b.. 8p versus p for an underdominant locus. Use the figure to convince yourself that the description of disruptive selection given in the preceding paragraph is correct. There are very few, if any, examples of underdominant alleles in high fre quency in natural populations. However, the fact that closely related species sometimes have chromosomes that differ by inversions or translocations suggests that underdominant chromosomal mutations do occasionally cross the unstable equilibrium. The evolutionary forces that push the frequencies over the unstable point are not known, although both genetic drift and meiotic drive are likely candidates. There is something unsatisfying about the description of the three forms of 3. 3 Three kinds of selection 69 natural selection. They come off as a series of disconnected cases. One might have hoped for some unifying principle that would make all three cases appear as instances of some more general dynamic. In fact, Sewall Wright found unity when he wrote Formula 3.2 in the more provocative form ( 3.5 ) - pq diD 6 sP - 2iiJ dp. ( The symbol diD Idp is the derivative or slope of the mean fitness viewed as a function of the allele frequency p.) Equation 3.5 shows that 68p is proportional to the slope of the mean fitness function. If the slope is positive, then so is 68p. As a result, selection will increase p and, because dwldp > 0, will increase the mean fitness of the population. If the slope is negative, p will decrease and, because diD Idp < 0, the mean fitness will increase once again. In other words, the allele frequency always changes in such a way that the mean fitness of the population increases. Moreover, the rate of change in p is proportional to the genetic variation in the population as measured by pq. Although we will not show it, the rate of change of the mean fitness, w, is proportional to pq as well. Thus, selection always increases the mean fitness of the population and does so at a rate that is proportional to the genetic variation. Problem 3. 8 Show that Equation 3.5 is correct. R. A. Fisher made a similar observation at about the same time as did Wright and called it the Fundamental Theorem of Natural Selection ( Fisher 1958 ). Fisher showed that the change in the mean fitness is proportional to the additive genetic variation in fitness. ( We will learn about the additive variance in Chapter 6.) As variances are always positive, the mean fitness will always increase when natural selection changes the allele frequency. The Fundamental Theorem of Natural Selection is undeniably true for theo retical populations with simple selection at a single locus. However, with more loci or if fitness depends on the frequencies of genotypes or if it changes through time, the Fundamental Theorem no longer holds. Thus, it is neither funda mental nor a theorem; some have claimed that it has little to do with natural selection. Its biological significance has always been controversial. Yet, the metaphor suggested by the theorem that natural selection always moves popu lations upward on the "adaptive landscape" has proven to be a convenient one for simple descriptions of evolution without mathematics or deep understand ing. The metaphor is stretched too far when applied to evolution for more than a few generations, as the change in fitness due to environmental change renders the metaphor inappropriate. Imagine climbing a mountain that keeps moving; despite your best efforts, the peak remains about the same distance ahead. That is the proper metaphor for evolution. Problem 3.9 Graph the mean fitness of the population as a function of p for s = 0. 1 and h -0.5, 0.5, and 1. 5. Do the peaks correspond to the outcomes = of selection described above? 70 Natural Selection 3.4 M utation-selection balance The vast majority of mutations of large effect are deleterious and incompletely dominant. They enter the population by mutatio and are removed by direc tional selection. A balance is reached where the rate of introduction of mutations is exactly matched by their rate of loss due to selection. The equilibrium number of deleterious mutations is large enough to have a major effect on many evolu tionary processes. Among these are the evolution of sex and recombination and the avoidance of inbreeding. Most of these mutations are partially recessive, h < 1/2, so their effects are not always apparent unless the population is made homozygous either by genetic drift or by inbreeding. In this section, we will study the balance between mutation and selection and then go on in the next section to describe the dominance relationships between naturally occurring al leles. Following our labeling conventions, A2 will represent the deleterious allele whose frequency is increased by mutation and decreased by directional selection. Selection will be assumed to be sufficiently strong so that the frequency of A2 is very small. As a consequence, the most important effect of mutation is to convert A1 alleles into A2 alleles. The reverse happens as well, but has little influence on the dynamics and can be ignored. Suppose, therefore, that there is one-way mutation from allele A1 to allele A2 , where u is the mutation rate, the probability that a mutation from A 1 to A 2 appears in a gamete. The effects of mutation on p may be described in the same way as was done in the discussion of the balance between mutation and genetic drift. For an allele in the next generation to be A1 , it must have been A 1 in the current generation and it must not have mutated, p' = p(l - u). The change in p in a single generation is fluP = -up. (3.6) Mutation rates are usually very small: 10- 5 for visible mutations at a typical locus in Drosophila to 10- 9 for a typical nucleotide. Thus, the frequency of A1 decreases very slowly while the frequency of A2 increases very slowly. If selection against A 2 is sufficiently strong, it will keep the frequency of A2 very low, allowing the approximation fluP = -u + qu -u, (3.7 ) because q 0. Problem 3.10 Follow the exact and approximate frequencies of the A 2 allele for two generations when the initial frequency of A2 is zero and u = 10- 5. What 3.5 Genetic load 71 is the relative error introduced by the approximation 3. 7? (The relative error is the difference. between the exact and approximate values divided by the exact value. ) From Equation 3.2, we can write the change in the frequency of A 1 due to selection acting in isolation, when q ;::::J 0, as pq s[ph + q ( 1 h ) ] b. s p = qhs. - ;::::J (3.8) 1 - 2pqhs - q2 s The approximation is valid when q ;::::J 0, which implies that p ;::::J 1 and w ;::::J 1. At equilibrium, the change i n the frequency o f A 1 by mutation must balance the change due to selection, 0 = b.up + b.sp ;::::J -u + q hs , which gives the equilibrium frequency of A2 , (3.9) The equilibrium frequency of a deleterious allele is approximately equal to the mutation rate to the allele divided by the selection against the allele in heterozy gotes. Recall from Chapter 1 that rare alleles are found mainly in heterozygotes, not homozygotes. Thus, it is not surprising that the equilibrium frequency of deleterious alleles depends on their fitness in heterozygotes rather than homozy gotes. 3.5 Genetic load Deleterious alleles cause problems for populations. One measure of these prob lems is the genetic load of the population, W max -w L (3. 10) Wm ax = ' where Wmax is the fitness of the maximally fit genotype in the population. The closer the mean fitness of the population is to the fitness of the most fit genotype, the lower the genetic load. The mean fitness of a population at equilibrium under the mutation-selection balance is w = 1 - 2pqhs - Ps i ;::::j 1 - 2ijhs ;::::J 1 - 2u. 72 Natural Selection Remarkably, the presence of deleterious mutations decreases the mean fitness by an amount, 2u, that is independent of the strength of selection in heterozygotes. The genetic load in this case is simply 1 - ( 1 2u.) 2 u, (3. 1 1 ) -· 1 L = = which follows from Equation 3.10 with Wmax 1. When selection is weak, = the frequency of the deleterious allele will be higher, but the detrimental effect of each allele on the mean fitness of the population is slight. When selection is strong, the frequency of A2 is less, but the effect is greater. Hence, the independence of the load on the strength of selection follows. Problem 3. 1 1 Derive the genetic load for an overdominant locus at equilibrium. (Do not include mutation. ) Is this greater or less than the load of a population made up entirely of A1 A1 individuals (and for which the A2 allele does not exist, even as a possibility) ? What are the implications of your answers on the biological significance of genetic loads? While the contribution of a single locus to the genetic load of a population is small (because u is small) , the cumulative contributions of all loci can lead to a substantial load. This is easy to see if we are willing to accept that loci act independently, i.e., when the mean fitness of the population can be written as the product of the mean fitnesses of individual loci, n w II Wi , i=l = where n is the total number of loci. This is tantamount to assuming a kind of epistasis called multiplicative epistasis, which will be discussed shortly. Using Equation A.4 and Wi ::::i 1 - 2ui for the ith locus we get the pleasing where is the genomic diploid mutation rate. Rather than a probability of mutation, U is the mean number of new (deleterious ) mutations that appear in a diploid offspring. The genomic genetic load is L = 1 - e -u (3. 12) because the most fit genotype is homozygous at each locus for the more fit allele, hence Wmax = 1 in Equation 3. 10. If the mean number of deleterious mutations, U, is large ( and if all of the assumptions of ·our calculation are met ) , the load 3. 5 Genetic load 73 can be devastating. For example, if U = 4, then 98 percent of the population will die from genetic causes. This calculation suggests at least two important questions: What are typical genomic deleterious mutation rates ( to be taken up in Chapter 7) ? How valid is the logic that led to the conclusion that 1 - e - u of the population is killed by its genes? The biological significance of genetic load, like that of the Fundamental Theorem of Natural Selection, has been hotly debated over the years. It has played such an important role in the history of population genetics that it is worthwhile spending time discussing its relevance. One simple way to see that something could be amiss is to consider what happens when we let the selection coefficient approach zero, 8 0. As L in Equation 3.12 is independent of 8, the load remains constant all the way until 8 = 0. How can this be? When 8 = 0 there is no selection and, as a consequence, no load. Yet Equation 3.12 says there is a load. Mathematically, the problem lies with the approximations: The assumption that q is small breaks down when 8 is similar in value to u because selection is too weak to counteract mutation. For mutations of large effect, say 8 = 0.001 or greater, the approximations are excellent and genetic load calculations can give valuable insights into the effects of deleterious mutations on populations. For smaller values of 8, not only do the standard mathematics of load break down, but so do the biological underpinnings. In fact, it was for small values of 8 that load theory had its most celebrated and controversial application. When the first molecular studies of genetic vari ation appeared, the unexpectedly high level of polymorphism suggested that the variation could not be maintained by selection because of the consequent genetic load. A similar argument was made for the molecular differences ob served between species. However, it soon became apparent that load theory has little relevance when selection is very weak, for essentially three different but complementary reasons. Our discussion of this will be much more transparent if we assume that alleles are additive (h = 1/2) and that the allele frequency and selection coefficient are the same at each locus. The first reason has to do with the assignment of fitnesses to genotype and is, in essence, an attack on multiplicative epistasis. The problem may be seen by writing the fitness of a genotype as a function of the total number of deleterious mutations it contains. When h = 1/2, the contribution of the ith locus to the fitness of the genotype is 1 - Xi8/2, where Xi is the number of the less fit allele at the ith locus in the genotype. For example, at the A locus Xi is 0, 1, or 2 for the genotypes A1A1 , A1A2 , and A2 A2. The fitness of a genotype with n such loci is n II = ( 1 _ Xi8 /2) = eL:7= 1 ln( l -X; s/2) :::::; e - Ys/2 , i l where Y =.E = l xi is the total number of deleterious alleles in the genotype. The peculiarity of this form of epistasis is illustrated in Figure 3.5, which graphs fitness as a function of the number of deleterious alleles, Y. Notice that the rel ative fitness of a genotype drops precipitously when it contains relatively few deleterious mutations and remains close to zero from then on. By contrast, syn- 74 Natural Selection "' "' s 0.5 i£: 0.0 L____....__:::...._ ==-.l------'----....;:,) 0 500 1000 1 500 2000 Number of deleterious alleles Figure 3.5: The fitness of a genotype as a function of the numbers of deleterious alleles for three models of epistasis. ergistic epistasis gives a very different view of gene action. Under this form of epistasis, a genotype accumulates a relatively large number of deleterious mu tations before there is a substantial reduction in fitness. If a genotype falls in the middle of the x axis, its fitness will be close to Wmax for synergistic epistasis but considerably below Wmax for multiplicative epistasis. If the distribution of genotypes in a population is concentrated near the middle, the genetic load will be small for synergistic epistasis and large for n ultipli atiV< pi, t · is. Which model has more to recommend it? B ib dat,a whi ch will b ex:amin cl in hap ter 7, and biological intuition give th · 110 I t synergistic epistasis. Th is by its If could be a convincing argum nt that g n ti loa l is not a serious problem for · populations with lots of weakly s I ·t d g · n· tlc variation. However t.here is the nagging concern that we lmow very little ab trL t.b pi t.atic relation::;hips ' , between alleles and until we do, we should r serv j 1 1dglll ut m he r levance of genetic load. The second reason genetic load is suspect is that the most fit genotype, the one used to establish Wmax , will never occur in a real population. This is because most genotypes in the population have relatively similar numbers of deleterious alleles. To see this, let Ybe the total number of deleterious alleles in a genotype, Y 0, 1,... , 2n. If the loci evolve independently, then the probability that a = genotype has Y= i deleterious alleles is the binomial probability (2n) q p. i 2n - i. 2 ( See Equation B.4.) Thus, the mean number of deleterious alleles in a genotype is 2nq and the variance in that number is 2npq. The most fit genotype, that with zero deleterious mutations, is 2nq c --- ex vn J2npq 3. 5 Genetic load 75 standard deviations below the mean number of deleterious mutations. In a species with 109 nucleotides, this is about 32,000 standard deviations below the mean! Such a genotype is so improbable that it will never occur in a real population, making its use as a benchmark in load theory difficult to understand. Most genotypes will have a number of deleterious mutations that is much closer to the mean number. The fact that most genotypes have relatively similar numbers of deleterious mutations implies that the variance in fitness will not be very large for weakly selected alleles. If the variance in fitness is not large, the population will not be experiencing many genetic deaths. The mathematics underlying this statement are much easier to develop if we assume a third kind of epistasis called additive epistasis, which is illustrated in Figure 3.5. Under this form of epistasis, each locus contributes to the overall fitness of a genotype by adding or subtracting a small value according to the number of deleterious alleles at that locus. For example, the contribution to fitness from the ith locus could be written as c- Xt· 8 /2 , where, once again, Xi is the number of deleterious alleles at the locus and c is a constant that plays no role in the variance calculation. The variance in fitness at this locus is (82 /4)Var{Xi } pq82 /2, = because the number of deleterious alleles at this locus is binomially distributed with parameters 2 and p. Under additive epistasis, the variance in fitness of genotypes is the sum of the variances of the individual loci, or npq82 j2. For example, if there are one billion segregating nucleotides and 8 10- 5 , the variance in fitness is less than 0. 1 and the standard deviation is less than 0.3. = There is nothing in these numbers to cause us any concern about the genetic well-being of the population. In the variance calculation, all comparisons are made to the mean genotype of the population. We are, in effect, saying that we do not know why the population mean is where it is, but we are comfortable discussing the fitnesses of genotypes that differ from the mean by relatively few nucleotides compared with the large number of nucleotide differences in the comparison that occurs in the load calculation. In the case of load we expect our model of fitness to extend across all possible genotypes; in the variance view our model of fitness need only extend a short distance from the mean. The former case expects too much of the theory; the latter expects very little. Load has been used to argue that molecular polymorphisms and substitutions could not be selected. This discussion should cast doubt on that argument. Genetic load does have a role to play in discussions of deleterious alleles of measurable effect, for example, 8 > 0.001. In this domain the approximations are valid and the insights are valuable. In fact, we will use the language of load 76 Natural Selection theory in the next section when discussing experiments that measure fitness differences between Drosophila genotypes. As the mutations contributing to these differences do have measurable effects, the use of genetic load theory is entirely appropriate. The use of multiplicative epistasis is more suspect, although without it load calculations become unwieldy at best. 3.6 The heterozygous effects of a lleles In 1960, Rayla Greenberg and James F. Crow published a landmark paper reporting the results of a study "undertaken in an attempt to determine whether the effects of recurrent mutation on the population and the deleterious effects of inbreeding are due primarily to a small number of genes of major effect or to the cumulative activity of a number of genes with individually small effects." The study did this and considerably more. Of particular interest was its suggestion that mutations of large effect are almost recessive (s 1 and h 0 ) while those of small effect are almost additive (s 0 and h 1 / 2 ). The prevailing view in the late 1950s was that most deleterious mutations are completely recessive (h 0 ). This paper is not only historically and scientifically important, but also = pedagogically valuable because it uses many of the ideas developed in this and the previous chapters. In addition, it introduces an experimental methodology that is central to population genetics. The design of the experimental part of the Greenberg and Crow paper was developed in the late 1930s by Alfred Sturtevant and Theodosius Dobzhansky. Back then, most of the genetic variation affecting fitness was thought to be due to rare, recessive, deleterious alleles. As rare alleles are usually heterozygous, these mutations would not be expressed in wild-caught individuals. An obvi ous way to study this "hidden variation" is to make individuals homozygous, as this allows the expression of recessive mutations. Drosophila was the only suitable organism for such a study because it alone allowed the experimental manipulation of entire chromosomes on which recombination is suppressed. The experimental design is illustrated in Figure 3.6. The purpose of the design is to construct flies that are homozygous for their entire second chro mosomes. The viabilities of these flies are then compared to those whose two second chromosomes are drawn independently from nature, thus mimicking ran dom mating. The first class of flies will be called inbred, and the second class will be called outbred.* The details of the design are as follows: P1 In the parental generation, +n/ + represents one male fly, obtained from nature or from an experimental population, that initiates the nth line of crosses. The symbols +n and + stand for the two second chromosomes found in the male fly. One of the second chromosomes of this fly will ultimately be made homozygous. As there is no recombination in male Greenberg and Crow called members of these two classes homozygotes and heterozygotes, respectively. However, while homozygotes is accurate for the first class, the loci of the second class can be either homozygous or heterozygous at each locus as they are in Hardy-Weinberg proportions. 3.6 The heterozygous effects of alleles 77 pl : +n/+ X Cy l en bw ;___J F1 : Cyl+n X Cy l bwD from line n + 1 ,---- F2 : bwD l+n X Cy l+n bwD l+n X Cyl +n+l F3 : +nl+n Cy l+n +nl+n+l Cy l+n bwD l+n bwD l+n+l Inbred Control Outbred Control 25 % 50% 25% 50 % Figure 3.6: The Drosophila melanogaster crosses used to uncover hidden variation. In each cross, the male is on the left. Drosophila, the chromosomes in this original male remain intact. The male is crossed to a Cy I en bw female. Cy is a dominant second chromosome mutation, Curly wing, that is placed on a chromosome with one paracen tric inversion on each arm to block recombination. The other chromosome has two recessive mutations, cinnabar eyes (en) and brown eyes (bw). This initial cross is repeated 465 times, each repetition using an independently obtained male. Each repetition is called a line; the lines are numbered sequentially from 1 to 465. F1 A single CyI +n male from each line is crossed to a Cyl bwD female. The female Cy chromosome in this step is a slightly fancier version of the previous Cy chromosome, with a pericentric inversion, S M1, providing extra safeguards against recombination. The homolog to Cy in the female contains a dominant brown-eye mutation, bwD. This is the critical step in the design as it assures that only a single wild-caught second chromosome is used. F2 Two different crosses occur in this generation. The first is a mating of a bwD l+n male to a Cy l+n female from the same line ( a brother-sister mating). The second is a cross of a bwD l +n male to a Cyl +n+l female from the (n + 1 ) st line. F3 The offspring from the brother-sister F2 cross will fall into four classes: +nl+n, Cy l+n, bwD l+n, and Cy l bwD , which are easily recognized be cause Cy and bwD are both dominant mutations. According to Mendel's law of segregation, these four classes should be equally frequent. However, as the Cyl bwD flies are not used in the analysis, they are not included in Figure 3.6. The +nl +n flies are homozygous at every locus on their second chromosome and for this reason are called inbreds. The offspring from the interline cross have the same phenotypic classes as those from the intraline cross, but the wild-type flies will contain two independently 78 Natural Selection Inbreds --- Outbreds 100 'p --o\ --- - 1 - qs. (3.13) In sequence, cancel the ones, cancel qs, move q to the right side of the inequality, and clean up to get (3.14) Thus, inbreeding depression implies that h is less than one-half. Recalling that in the s-h parameter system the A1A1 genotype is always more fit than A2 A2 , we can conclude that inbreeding depression implies that the fitness of the het erozygote is, on average, closer to that of the more fit homozygote. Recessive deleterious mutations have this property, as do overdominant mutations. While intuition may have suggested that inbreeding depression implies overdominance, we now see that it only limits the heterozygous effect to being less than one-half. Of course, this is a tremendous step forward in our quest to learn more about the alleles responsible for genetic variatio in fitness. Already we can pay less 80 N atural Select ion ith locus 0 :: Genotype Probability Selection Coefficient A1 A 1 A A A& A& Figure 3.8: The possible states of a typical locus in an inbred fly. A superscript d on an allele indicates that it is a deleterious mutant. A superscript l identifies a lethal mutant. attention to underdominant and recessive advantageous mutation and focus on those mutations with h < 1/2. The next step in Greenberg and Crow's analysis is an indirect but brilliant inference about the relationship between h and s, which begins with some for mulae that relate the mean relative viabilities of flies to the frequencies and effects of deleterious mutations. Three viability estimates are required: A The average relative viability of outbred flies. From Figure 3.7 we have A = 1.008. B The average relative viability of inbred flies: B = 0.632. When other studies are included, B is found to lie between 0.614 and 0.656. C The average relative viability of inbred flies without lethal mutations on their second chromosomes: C = 0.842. Among all such studies, C ranges from 0.829 to 0.860. Each second chromosome is imagined to have an unknown number, n, of loci that are capable of mutating to deleterious and lethal alleles. The frequency of the deleterious allele at the ith locus is called qi and its selection coefficient, which is thought to be small, si. Similarly, the frequency of the lethal mutation at the ith locus is called Qi and its selection coefficient, which is close to one, Si. Inbred flies have lower viabilities than outbred flies because they are more likely to carry one or more of these mutations in the homozygous state. The probability that a particular inbred fly is homozygous for a deleterious allele at the i locus is qi. The probability that it dies from this allele, given that it is homozygous, is si. The probability that it is both homozygous and dies is qisi. Finally, the probability that it survives is 1 - qisi. The situation at a typical locus, the ith locus, is illustrated in Figure 3.8. If-and this is a big 'if'-the loci act independently in their effects on the probability of survival, the probability that a particular inbred fly survives to adulthood is IT ( 1 - qisi ) ( 1 - QiSi) · n B=A (3.15) i =l 3.6 T he heterozygous effects of alleles 81 'fit fa ·tor A repres·nts th probability of sm-vival £ r till uLbr a 2 /2, then that allele will :fuc in the population. Where exactly did the heterozygote, which is never the most fit genotyp e in any particular environment, get a geometric mean advantage? Recall that the geometric mean is a decreasing function of the variance of a random variable (Equation B.3). As the fitness of the heterozygote is the average of the fitnesses of the two homozygotes, 1 + (Y1t + Yi)/2, and as the variance of an average of two random variables is less than the average of the two variances, the ge ometric mean fitness of the heterozygote will suffer less of a reduction due to the variation in fitnesses than will either homozygote. This is the root cause of the geometric mean fitness advantage of the heterozygote and the consequent balancing selection. We can find out a bit more about our model by considering the mean and variance in the change in the allele frequency in a single generation. For a fixed Y1 and Y2 , the change in p is pq (Yl - Y:f) sP = 2 1 + pYlt + qY:f. If the Ys are small, this may be approximated by pq [Y1 - Y - pY( + (p - q) Y1 Y + q Yn sP 2 2 2 when the denominator is expanded as a geometric series and powers greater than two are ignored. (The superscript ts have been removed from the Y s for clarity. ) The mean change is m (p) = pq [ ( P.l - f.L2 )/2 + a2 (1/2 - p) ] , E{ sP } where we have assumed that /-Li and a 2 are both very small and of a similar order of magnitude. A consequence of the similarity in magnitudes is that In other words, we have assumed that 1-LT « a2 p.. For the variance in sP we can take the variance of because the variance of 1''? terms are vanishingly small: 2 2 a2 v(p) Var{ sP} p-q- 2. = The mean change in p , m (p) has an interesting term on the right that is suggestive of balancing selection as it is positive when p is small and negative when p is close to one. In fact, if the mean change were the whole story, we could use end-point analysis to guess that polymorphism will occur when IP. 1 - P.2 l < a2. 3. 7 Changing environments 89 Bowever, this is not the same as Equation 3.20, which shows that the mean change is not the whole story. When l tt 1 tt2 1 < a , there is a mean push of 2 - allele frequencies away from the end-points. This is opposed by the variance in the change, which causes allele frequencies to disperse, which results in a battle between centrifugal and centripetal forces. There is a cautionary tale here that the mean properties of stochastic processes are not always faithful descriptors of the behavior of processes. Random fluctuations in fitness cause random changes in allele frequencies. Genetic drift does this as well. One of these two forces must have a greater impact in most natural populations, but which one? The question may be asked more precisely by comparing the variance in p' for genetic drift pq 2Ne with that for fluctuating selection The ratio of the variance due to selection to that due to drift is This ratio makes it is clear that drift will dominate selection when p is close to zero or one. However, for moderate allele frequencies, selection may dominate if Nea2 » 1 ; this is the case, for example, in a population of effective size Ne = 106 and a2 = w - 4. Selection of this magnitude is very weak, yet it is sufficient to make selection a much more important stochastic force than drift for common alleles. In the previous section, we argued that the fitness of a heterozygote is not exactly intermediate between the two homozygotes, but is closer to the more fit homozygote. This form of dominance may be incorporated into the derivations of this section relatively easily usin:g the function, w(x) , defined on page 83: Genotype: A 1 A1 A 1 A2 A2 A2 Enzyme activity: 1 + ylt 1 + (Y[ + Yi ) /2 1 + y2t Absolute fitness: w ( 1 + yn w ( 1 + (Y1t + Yi)/2) w ( 1 + Yi ) Assuming that w(1) = 1 , the fitness of a homozygote may be written using a Taylor series expansion around one as where the derivatives w' w'(1) and w" = w" (1) and the superscript ts have = been removed for clarity. The geometric mean fitness for homozygotes is ap proximately the mean minus one-half the variance, 90 N atural Selectio n The geometric mean for the heterozygote may be found with a little more work ) wi2 1 + w' (J.Ll + J.Lz )/2 - (w'2 - w " )u 2. With these two means, the condition for balancing selection becomes To make this more concrete, use Equation 3.19 for w(x) to obtain 2+a IJ.L 1 - J.Lz l < u2 2(1 + a ) " When a is small, this condition becomes approximately which, when compared to Equation 3.20, shows that when h < 1/2 the con ditions for polymorphism become easier to satisfy than when h 1/2. As = the heterozygotes are closer to the more fit homozygote in the former case, we shouldn't be surprised at this comparison. A great many models of selection in a changing environment have been analyzed and, in many cases, the condition for polymorphism may be written in the form IJ.LI - J.Lz i < cu2 , where c is a constant reflecting such factors as dominance, the spatial structure of the environment, the temporal autocorrelation of fitness, and a plethora of correlations of fitnesses. Thus, our insight that polymorphism will occur if the variance in the fitness is large relative to mean differences in fitness applies in a much wider setting than described here. Given the ease with which fluctuating environments maintain variation and the fact that temporal fluctuations can cause the fixation of alleles, it is not surprising that an attractive alternative to the neutral theory as an explanation for molecular evolution and polymorphism is based on selection in a random en vironment ( Gillespie 1991). With fluctuating environments, there is balancing selection in nature, yet in the laboratory the experimentalist will see incomplete dominance. Most of the variation that Crow and Greenberg called deleterious could be maintained by fluctuating environments and should be called some thing else. At the time of this writing, there is no way to know whether the "deleterious" load is due mainly to alleles held in the population by mutation selection balance or by balancing selection. 3.8 The stationary distribution Just as with the balance between drift and mutation described in Section 2.9, we can obtain the stationary distribution for selection in a changing environment 3.9 Selection and d rift 91 using Wright's formula, e2 f"(m (x)/v(x))dx ¢(p) = v (p) ( see Equation B.27). For random environments with additive alleles, the mean change in p is m(p) = pq [(I-Ll - f.L2 ) / 2 + 172 ( 1 / 2 - p) ] , and the variance in the change is The argument of e in Wright's formula is 2 /P m(x)dx = v(x) (2 f.L/e72 ) JP dxj[x(1 - x)] + 2 JP ( 1 - 2x)dx/ [x(1 - x)] = (2f.L/e72 ) ln(x/( 1 - x)) + 2 ln (x ( 1 - x)), where 1-L = I-Ll - J-L. When this is substituted into Wright's formula, we get 2 Equation B. 1 9 shows that this is a beta density. The beta distribution only exists if the powers of p and 1 - p are greater than -1. In our case this occurs when 2 J-L/ e72 > -1 - 2 f.L/C72 > -1. When these two conditions are combined, we get which is the same as we obtained using the geometric mean approach. That the stationary distribution for the balance between mutation and ge netic drift and for random environments are both beta distributions is a re markable coincidence. An unfortunate consequence is that we cannot use the distribution of allele frequencies to distinguish between these two models. 3. 9 Selection and d rift Our discussion of directional selection left the impression that the most fit al lele eventually reaches a frequency of one. This is true for alleles of moderate 92 Natu ral Selection fr qu ncy hnt t true for alleles with.nly n or a few pies is definit · ly n ir1 th p pul at i n. TIPs s ar allel ubj t to the vagaries of Me nd l':s law of segregation and t deJu g l p b i stocha.o; ticity. It is eaEy to se tha the fate r t a single copy of an all 1 with ay, a 1 percen advantag is d tcrmincd m ost ly by chance. If its frequency houl l becom m de rate , th n i average selec i ve advanta,ge ca n overcome the effe ts f gen ti d rift. The interaction of drift and selection is more complex than that of mutation and drift because the strength of selection changes with the frequency of t he allele. (Recall the factor pq in b.8p or examine Figure 3.3.) Natural selection is always a very weak force for rare alleles and, if s » 1/(2N) , is a much stronger force than drift for common alleles. Thus, the dynamics of rare alleles should be influenced by both drift and selection, while the dynamics of common alleles should be determined mainly by natural selection (again, if s » 1/(2N)). In a finite population, a new advantageous mutation is usually lost. In fact, it is often lost in the first generation of its existence, even if it enjoys a substantial selective advantage. This is easy to prove if we assume that the A1A2 heterozygote has a Poisson distributed number of A 1 A2 offspring and that the mean of the Poisson is 1 + s/2. The probability that this heterozygote has no A1A2 offspring, hence extinction of the A 1 allele, is ( U se Eqnation n.r:: and the s expansion for -s/2 for this appwxi aylor seri e mation.) Thls re,sul i s int restingfor a nmnb r of reasons. First re an {rom Pr bl m 2.1 n pt'l.ge 22 that th proba b il i ty that a new n utral mutation is ost in a single gen rati o n is , - J 0.3679. ur result shows that a new mutation with say, = 0.01, is lost wit h probability 0.3642, wh ich is uot very differ nt from the neutral probab ility Th 5 p r ent a.dvanta.ge ,njoyed by ·i;his muta.. tion in the hete rozyg te d s l ittl to d Pas its pro babi li ty of extinction in its first generation of l i fe. A second int resting a e of this result is that it is lndepenclen f the popul ati on size even th u.,.h w m otivated tb calculation · by clairning t.hat t.h mutation would be lost becaus f g n tic drift. U cl ri £ is the ulpri1., why doesn t its p aramete r N appear in the result? The reas 1 1 , f cow·s is L bat om· m del i gnor th total population size beca us w jndg ·. it to 1 e lrr 1 vant to the number of offspring produc-d by the l o ne heterozy go t e. Perhaps we would I bett r off saying that th AJ allele is lost becaus · of cL mograpili stochasti " ty or M ndel s law rather than g llleti drift When tlt A J all le becomes moP common and our interest turns from ·th number r opies LO its fr quency, its stochasti dynamics a.r more rr t;ly said t be gov rn d by gen eLi drift. This dist i nc t ion of th r le of population siz in th dynaml ·s of ra.r nd ommon alleles is im portan and should b k pt in mind · when discussing the fate of newly arising mutations. We turn now to the fate of the A1 allele with an arbitrary initial frequency. The probability of ultimate fixation of the A1 allele, given that its initial fre- 3. 9 Selection a nd d rift 93 0.8 2Ns=l 2Ns= 10 2Ns=1 00 0 L------L-- ---L-- 0 0.2 0.4 0.6 0.8 Initial allele frequency Figure 3. 10: The fixation probability for advantageous alleles as a function of their initial frequencies. quency is p and that h = 1/2, is 7ri (P) = 11 _- ee--22N,sesp N · (3.21) (This formula is. Th nssociat,i n of all 1 s on chromosomes is desc1·ibed by a. quantity alled th linlmg dis quilib1·ium, wh e s ·atistical properties ttnd dynamics un der random maiing are w ll und r t d. The basi · quati ns of multi-locus g neti drift and select.ion btve l en known for d ad· ·, although t heir analysL-; has b en imped ·d by thejr extraordinary ompl xit.y. Ah·eady, genomi studies have sugg st d an exciting new form of evolution in whi h sele tion at n lo t1S aiT ''t' the dynamics of link d loci in a process called h itchhi king. This chap er will touch on each of these areas. 4.1 Linkage d iseq ui librium Linkage disequilibrhUll i'3 us·.d t.o describ the associati ns of alleles on chromo somes. It appears quit natnrai.Ly when describing the dynami s f gametes with random matiug and recomb.ina. :ion. Th simplest lll d l capable of showing the effects of r com ! ination is of a diploid sp ie.· with tw link- d lo i, each with two segregating alleles. The left-hand side of Figure 4.1 illustrates the position of the two loci on the chromosome. The probability that a recombinant gamete is produced at meiosis is denoted by r, which is often called th rec mhination rate. (The genetic or map distance between the loci is alway:; [:,'Teater than r because it is the average number of recombinational events rather than t.h probability of producing a recombinant offspring. ) 1 01 102 Two-Locus Dynam ics Gamete Frequency X1 A1 B1 X2 r A1 B2 r= A B X3 A2 B1 X4 A2 B2 Figure 4.1: The chromosome on the left shows the position of the A and B loci. The right side illustrates the four possible gametes with their frequencies. The right-hand side of Figure 4.1 shows that there are four gametes in the population, A 1 B1 , A1B2 , A2 B1 , and A 2 B2 with frequencies x1 , x2 , x3 , and x4 , respectively. The frequency of the A1 allele, as a function of the gamete frequencies, is p1 x1 + x2. Similarly, the allele frequency of B1 is P2 x1 + x3. = = Recombination changes the frequencies of these gametes in a very simple way. For example, the frequency of the A1B1 gamete after a round of random mating, x , is simply (4.1) This expression is best understood as a statement about the probability of choosing an A1B1 gamete from the population. A randomly chosen gamete will have had one of two possible histories: Either it will be a recombinant gamete (this occurs with probability r) or it won't be (this occurs with probability 1 -r). If it is not a recombinant, then the probability that it is an A1B1 gamete is x1. Thus, the probability that the chosen gamete is an unrecombined A1B1 gamete is (1 -r )x1, which is the first term on the right side of Equation 4.1. If the gamete is a recombinant, then the probability that it is A1B1 is the probability that the A locus is A1 , which is just the frequency of A1 , p1 , times the probability that the B locus is B1 , p2. The probability of being a recombinant gamete and being A 1 B1 is rp1p2 , which is the rightmost term of Equation 4. 1. The allele frequencies can be multiplied because the effect of a recombination is to choose the allele at the A and B loci independently. Problem 4. 1 Derive the three equations for the frequencies of the A1B2 , A 2 B1 , and A 2 B2 gametes after a round of random mating. The change in the frequency of the A1B1 gamete in a single generation of random mating is, from Equation 4. 1 , r X l = - r ( x l - P 1 P2 ) · (4.2) The coefficient of r provides a definition for the linkage disequilibrium, D = x1 - P 1 P2 , 4. 1 Linkage disequilibrium 103 which is a measure of the difference between the frequency of the A1 B 1 gamete, x 1 , and the expected frequency if alleles associated randomly on chromosomes, p1P2 · ( If there were no tendency for the A 1 allele to be associated with the B1 allele, the probability of choosing an A 1 B1 allele from the population would be the product of the frequencies of the A1 and B1 alleles. ) The change in x 1 , written as a function of D, is The equilibrium gamete frequency is obtained by solving.6.rx1 = D = 0, From this, we conclude that recombination removes associations between al leles on chromosomes. The time scale of change of gamete frequencies due to recombination is roughly the reciprocal of the recombination rate. The frequency of the A 1B1 gamete may be written X l = P1P2 + D , which emphasizes that the departure of the gamete frequency from its equilib rium value is determined by D. The linkage disequilibrium may also be written in the more conventional form ( 4.3) which leads to the following new expressions for the gamete frequencies: Gamete : Frequency : X1 X2 Frequency : P1P2 + D P1 q2 - D Problem 4. 2 Show that the gamete frequencies as a function of D are correct in the above table. Next, show that the definition of D in Equation 4.3 is consistent with the expressions for gamete frequencies by substituting p1p2 + D for x 1 and the equivalent expressions for the other gametes into the right side of Equation 4.3, proceeding with a frenzy of cancellations, and ending with a lone D. The A 1B1 and A 2 B2 gametes are often called coupling gametes because the same subscript is used for both alleles. The A1 B2 and A2 B 1 gametes are called repulsion gametes. Linkage disequilibrium may be thought of as a measure of the excess of coupling over repulsion gametes. When D is positive, there are more coupling gametes than expected at equilibrium; when negative, there are more repulsion gametes than expected. The value of D aft

Population Genetics: A Concise Guide 2nd Edition PDF

Document Details

Tags

Related

Summary

Full Transcript