Week 3 - Linkage, Recombination, and Genetic Mapping PDF
Document Details
Uploaded by Deleted User
Tags
Related
Summary
These lecture notes detail linkage, recombination, and genetic mapping. The text explains how genes linked on the same chromosome don't assort independently, and the frequency of recombination can be used to map their relative positions. It also discusses how recombination frequency, calculating recombination frequency, and full genetic maps are used to develop the maps.
Full Transcript
WEEK 3 - LINKAGE, RECOMBINATION AND GENETIC MAPPING DESCRIBE THE TERMS LINKAGE AND RECOMBINATION Alleles from linked genes assort together If two genes are completely linked/on the same chromosome the alleles from the parent will assort together 100% of the time...
WEEK 3 - LINKAGE, RECOMBINATION AND GENETIC MAPPING DESCRIBE THE TERMS LINKAGE AND RECOMBINATION Alleles from linked genes assort together If two genes are completely linked/on the same chromosome the alleles from the parent will assort together 100% of the time ○ Complete linkage is almost never seen due to recombination → two genes are never completely linked despite being on the same chromosome, they are almost never inherited together due to recombination between chromosomes If a dihybrid cross (AaBb x AaBb): 3 A-B- : 1 aabb phenotypic ratio → no independent assortment If a test cross (AaBb x aabb)- 1 AaBb : 1 aabb phenotypic ratio → gametes are the options given Ratios depend on the degree of linkage → how closely are the genes linked Genetic maps of chromosomes were possible to produce prior to the invention of Sanger DNA sequencing If the genes are on the same chromosome there will not be independent assortment of alleles Result: always get the dominant with dominant and recessive with recessive due to the way the alleles are arranged on the parental chromosomes Recombination during reassort alleles on homologous chromosomes In meiosis, when two chromosomes pair up forming a tetrad they undergo crossing over where genetic information is swapped between chromosomes (sister chromatids) via chiasmata. Swapping means if you have two dominant alleles on one chromosome and two recessive alleles on the other you can get a swap = both on both Example: two genes on the same chromosome and single recombination occurs, 2x non-crossover and 2x crossover gametes Linked alleles can be separated by recombination during meiosis producing: ○ Parental genotypes (P) = non crossover gametes ○ Recombinant genotypes = crossover gametes If a dihybrid cross (AaBb x AaBb): ○ ? A-B- : ? A-bb : ? aaB- : ? aabb phenotypic ratio ○ From the diagram, it seems like that is a 1:1:1:1 ratio ○ However, this is not accurate as the degree at which there are crossover gametes versus non-crossover gametes is dependent upon how linked the genes are If a test cross (AaBb x aabb) ○ ? AaBb : ? Aabb : ? aaBb : ? aabb phenotypic ratio Ratios of phenotypes depend on the ‘degree’ of linkage = i.e. how closely are genes linked? ○ Therefore, the ratios cannot be determined Example: if two genes are carried on autosomes and there is no interaction between the gene products (e.g. epistasis, duplicate genes etc.) the F2 from a dihybrid cross always produces 9:3:3:1 The discovery of linkage displays how testcrosses highlight linked genes Linkage was discovered by one of Thomas Morgan’s students, Alfred H. Sturtevant. ○ eye colour: pr+ - wildtype, dominant to pr - purple ○ wing length: vg+ - wildtype, dominant to vg - vestigial The more linked the genes are = more likely they will be inherited in their parental arrangement The cross between the F1 flies was predicted to have 1:1:1:1 ratio but instead there was a much higher proportion of the parental phenotypes - this was predicted because these phenotypes are the sames ones from the parents (P1 and P2) ○ It was proposed that the two genes were on the same chromosome Parental genotypes were the arrangement of the alleles in the parents Recombinant genotypes were due to crossing over between the parental chromosomes ○ It was proposed that because the genes were on the same chromosome, recombinant gametes only occur if crossover occurred. ○ If crossover was a rare event, there will be fewer recombinant gametes ○ Not 9:3:3:1 but larger proportions demonstrate the parental genotypes Parental and Recombinant chromosome arrangements It was proposed that the two genes were on the same chromosome ○ Parental genotypes were the arrangement of the alleles in the parents ○ Recombinant genotypes were due to crossing over between the parental chromosomes Parental allele combinations an be in one of two phases In our previous example the dominant and recessive phenotypes were together on the parental chromosome. This isn’t always the case, you can have parental chromosomes in either: Coupling or Repulsion ○ Coupling - the dominant phenotypes are ‘coupled’ on the parent chromosomes ○ Repulsion - one parent has one of the recessive phenotypes and the other parent has the other one of the recessive phenotypes. They are ‘repulsed’ as they have both the dominant and recessive traits Example: Distance between genes impacts recombination frequency Recombination between homologous chromosomes is a random event - can happen anywhere along the chromosome What changes recombination frequency → how often we see recombinant gametes and thus recombinant progeny from the parental gametes Recombination frequency is correlated to physical distance between genes ○ Therefore, if we know how frequently they recombine we can map genes Genes close on same chromosome: Genes distant on same chromosome: Less chance of recombination Greater chance of recombination switching alleles switching alleles Smaller proportion of recombinant Greater proportion of recombinant gametes gametes less area for recombination to occur more area for recombination to if the genes are close together occur if the genes are far from each lesser proportion of recombinant other gametes greater proportion of recombinant gametes Calculating Recombination Frequency The proportion of recombinant gametes depends on how often crossovers occur between the genes Recombination frequency (RF) = Number of recombinants / Total progeny x 100 ○ If crossovers never occur between two gene loci, RF=0% → all progeny will look like the parental phenotype ○ The further two genes are apart RF increases ○ The maximum RF is 50% Why the maximum is 50% can be explained through meiosis If we assume recombination occurs in every meiosis, each gamete will have a different combination, such as the image above = 50% is a crossover gametes while the other 50% is a non-crossover gamete → 50% parental and 50% recombinant If no crossover occurs = all offspring look exactly like their parental phenotypes and thus their genes are very close to one another along the chromosome → co-segregated ∴ Recombination frequency can help define linkage: ○ Genes that have RF < 50% are linked If it reaches 50% and they are so far apart that they segregate from ach other 50% of the time, then they are essentially independently assorting on completely different chromosomes Therefore, 25%) we need to test if our genes are likely to be linked or not Perform a chi-squared goodness of fit test based on the genetic hypothesis that the genes are unlinked ○ If we support the Null, variation is due to chance and supports genes being unlinked ○ If we reject the Null, variation is likely due to another reason, supports the genes being linked If they are unliked and tested this way then they assort randomly and thus do not need to incorporate them into the mapped distance Full Genetic Maps Because far gene recombination frequencies are inaccurate, maps were built linking intermediate genes Can this process be carried out in humans? Yes, but it’s difficult because ○ Small number of progeny ○ Can’t perform desired crosses ○ Restricted to pedigree analysis Things are always more complicated in reality As mentioned before, there is only a 50% chance of crossover as the other two sister chromatid remains unaffected In reality, there are multiple crossovers occuring, double crossovers between different strands, there are can three or even four chromatids involved in a double crossover → **Not covered in unit Mapping gene with three point (trihybrid) test crosses Previously we were mapping two genes at a time Mapping three genes at a time has advantages: ○ Faster: 3 points on the map thus need to do less crosses in order to actually map multiple genes ○ More accurate: take into account double competence Drosophila example: ○ Result: body colour, wing length and eye colour are all dominant as if you have the genotype of the wildtype allele then you get wildtype body, length or eye colour Recessive alleles result in mutant phenotypes First consider 2 genes at a time (ignoring the 3rd gene), and determine the map distance between them. ○ We look for parental combinations between the two traits and add them together ○ This means that the remainder are all recombinants ○ We can use the number of recombinants to determine the recombinant calculations; We can now study the next two genes; EXAMPLE: MAP FROM INITIAL RECOMBINATION FREQUENCIES We can now draw the following map of our three loci: The order of b and vg can be switched Note that the distance between the outer loci is less than the sum of the internal regions. ○ This is due to double crossovers; ○ Crossovers can occur either side of the middle locus ○ Each of these progeny represent 2 crossovers between the outer loci Neither was counted as they appeared as a parental arrangement ○ Double crossovers appear as parental for the outer loci, but recombinant for the middle loci Present and has reverted the two outer alleles to their wildtype form due to not looking at the middle form of the genes Result: get a crossover before purpose and a crossover after purple thus keeps the wildtype phenotype for b and vg but not for purple thus include double recombination with the crossing of alleles To identify double recombinants - look for the parent locis on the outside with a recombinant on the middle Identify double recombinant progeny for a tri-cross map: ○ Smallest number as double recombination is rare ○ Once established which two alleles are furthest apart look for which are parental on the outside and recombinant for the middle arrangement TAKE INTO ACCOUNT THE EFFECT OF DOUBLE RECOMBINANTS AND INTERFERENCE IN CHROMOSOME MAPPING Taking into account double crossovers Each double recombinant represents two crossovers, so the true RF for b and vg : = (original recombinants + 2 x double recombinants) / total = [(179 + 2 x 3) / 1005] x 100 =18.4% Therefore, b and vg are 18.4 mu apart (the sum of the inner regions) Tri-crosses are more accurate as you can take into consideration the double recombinant crosses Interference changes the proportion of double crossovers Does the occurrence of one crossover have an effect of another nearby ○ i.e. are there more or fewer double crossovers than we would expect? We can calculate the expected number of double crossovers from our map distances ○ Chances of DCO = chance of crossover between b and pr X chance of crossover between pr and vg = 0.061 x 0.123 = 0.0075 Among 1,005 offspring: = 1005 x 0.0075 = 7.5 number of double crossovers Coefficient of coincidence and interference The coefficient of coincidence (C) = Observed / Expected For our example, C = 3/7.5 = 0.4 or 40% ○ i.e. only 40% of expected double crossovers occurred and 60% have been interfered with ○ As the number was less than expected it suggests one crossover interferes with one nearby This can also be represented as the degree of interference (I) where I = 1 - C Complete Positive interference No interference Negative interference interference If a single crossover The chance of one occurs there will be a crossover increases reduced chance of the chance of another another crossover crossover occuring C=0, I=1 C > 0, I < 1 C=1, I=0 C > 1, I < 0 Example: CI of 35% shows a 65% chance of double crossovers have seen interference = positive For a given trihybrid testcross, the expected number of double recombinants if 20 then observed is 10 with a CI of 0.5 Why generate genetic maps and consider linkage? Traditional uses: ○ Construct a view of the organism genome ○ Identify and clone genes using their map position ○ Determine if mutations affect different genes Continuing uses: ○ Used to identify disease causing rare disease causing alleles Molecular markers mostly used in humans as it is cheaper ○ Linkage must be considered in genetic counselling/ risk calculations ○ Assist in genome sequence assembly Vast majority of uses of genetic maps are no longer required due to sequencing technology e.g. whole genome sequencing, next generation sequencing and third generation sequencing thus less reliance on genotypic mapping and more on DNA sequence then annotating the genes as required. Yet continued use as whole genome sequencing is not yet cheap enough that there is a rare genetic disease within a family, can’t sequence the entire genome of multiple individuals within the family → rely on molecular markers MAPPING DISEASE LOCI Markers across the chromosome used in hand with pedigree information to identify regions likely to encode pathogenic variants We can sequence the markers from across the chromosome of affected and unaffected individuals in the same family ○ This is because unaffected individuals in the same family will share a lot of DNA with the affected individuals but they shouldn't have the affected regions ○ Therefore if we map which parts of the chromosome segregate with the disease we can narrow in on the part of the chromosome that lead to disease CONSIDERING LINKAGE IN RISK CALCULATION When determining the risk of someone inheriting multiple conditions, you may need to consider linkage e.g. A female has two dominant autosomal disorders caused by mutations/variance in two different genes – they are heterozygous for both. What is the chance their children will inherit: ○ both mutations? ○ one or the other mutation? ○ Neither mutation? ○ These calculations vary depending on if the genes are linked Can consider the two dominant disorders as two independent events, if they are dominant heterozygous then the chance of any one of their offspring getting the condition is 50% and 50% thus both conditions is 25% If the alleles cause a condition which is linked the genes are on the same chromosome and are nearby → changes your risk calculation OUTLINE THE MAJOR TYPES OF DNA VARIANTS Variable region of the genome is up to 0.4% of the human genome. As such, it would be more accurate to say that humans are 99.6-99.9% identical to one another. Goal: outline what the variation is and the techniques we can use to measure the variation Major Classes of DNA Variation - TYPES OF CHANGES Small variants Large variants Make up majority of variants - ‘small There are large stretches of base pairs nucleotide variants (SNV) or single that have certain changes to them nucleotide polymorphism’ ○ Deletion They are variants that only impact one ○ Duplications to a few bases of nucleotides at each ○ Inversions locus ○ Insertions ○ E.g. CG → TA ○ Translocation The other type of variant is ‘indels’ which stands for insertion/deletion ○ They are grouped together because they have a similar appearance ○ They are either the addition or removal of a few base pairs REPEATS Tandem - repeats of DNA that come one after the other ○ Micro-satellites (STRs) - small base base repeats ○ Mini-satellites (VNTRs) - slightly larger repeats Intersperse - repeats of DNA that are not one after the other. Usually controlled by an associated with an enzyme through a cut/paste mechanism ○ Transposons ○ Retroelements SMALL VARIANTS Variation can be Inherited – transferred from parents Due to de novo changes – occur in the gamete, zygote or somatic cells Tandem repeat number: highly variable within a population thus can be large differences regarding the level of variation when trying to genotype individuals Result: whole range of elements which can be different between individuals When is something classed a ‘variant’? Variants are identified in comparison to a reference sequence In humans, novel variants are identified in comparison to the ‘Human reference genome’ ○ Published by the Genome Reference Consortium ○ Currently on build 38 (GRCh38) with the last update in 2022 ○ Anything different from the reference genome will be called a ‘variant’ ○ A Variant is dependent on ethnic background, family history etc. ○ Hence, using the reference genome can be biassed Human reference genome can establish common bases and structural variants at particular locations within the reference genome To establish the variant type, take the reference genome and compare this to your sample genome for the region of interest finding the differences Variants can be: ○ Pathogenic – cause changes in gene expression or gene product function, leading to conditions. ○ Non-pathogenic – no effect on gene function. Majority of variation in humans is non-pathogenic. Number of variants: ○ Single nucleotide variant: 5 million ○ insertions/deletions: 600,000 → impacts the genome more significantly than the number of insertions due to the ability for numerous insertions to be present ○ Structural variants: 25,000 PROVIDE A DEFINITION FOR GENOTYPING VS. SEQUENCING AND HIGHLIGHT THEIR DIFFERENCES Sequencing vs. genotyping Sequencing is effectively taking the full picture of bases in a region that could be part of a gene, entire gene, multiple genetic regions or even the whole genome. It results with a complete sequence which gives you the complete variation or level of variation in the whole sequence Genotyping is when we go looking for a specific specuence/variance which we already know exists Sequencing methods First generation - Sanger sequencing Great for single region Requires specific primer - therefore we needs to know the sequence of the region we want to sequence The sequence is then ‘generated’ using di-deoxynucleotides that will terminate synthesis (also fluorescent) Mostly suitable for confirming small variants ○ Look at a small region at a time, it is not ideal for looking at many variants instead use other techniques that let you identify a single genetic locus ○ It will take too much time to sequence multiple sequences Sequences only one region at a time = not multiplex Second generation - Massively parallel Produces many short reads Sequencing ○ Allows you to target multiple genes in a single run Can sequence many targets to whole genomes in single run ○ You anneal different regions of DNA to a flow cell which form clusters ○ Each cluster is a unique sequence that could be either different portion of the same genome or multiple different genes of interest ○ Each cluster can be read independently Mostly suitable to identifying small variants Sequences multiple sequences at a time = multiplex Problem: short reads mean small variants as the short reads don’t hold the longer variants in one run Multiplexing due to the number of reads Third Generation - Long read sequencing Produces fewer long reads Can sequence some targets to a whole genome in single run ○ Sequences large reads of DNA - around the 50,000-100,000 base pair range Identification of large structural variants Can be slightly inaccurate - therefore you won't be able to do as many samples Sequences multiple sequences at a time = multiplex NEXT GENERATION SEQUENCING IN DNA VARIANT IDENTIFICATION, AND WHY IT HAS NOT FULLY REPLACED CONVENTIONAL GENOTYPING TESTING Next Generation Sequencing in variant identification Variant identification mostly uses second generation sequencing There are multiple ways to conduct the sequencing depending on what we want to do/achieve Targeted gene panel: target the whole gene for a particular panel such as arrhythmias within the body, run multiple patients using the same arrhythmia panel on the sequencing run to give you more information Genome sequencing Exome Sequencing Targeting gene panel Coverage: all genes and Coverage: Entire genome Coverage: 10-500 genes noncoding DNA (20-25k genes) Accurate: High Accurate: Low Accurate: Good Time: Rapid turnaround time Time: Longest turnaround time Time: Long turnaround time (few days) Cost: most expensive Cost: cost-effective Cost: Most cost-effective Depth: >30x Depth: >50-100x Depth: >500x Effective if you have Uses captured-base no idea what variant is probes to sequence Sequence only genes causing a disease only the exons of a that are responsible for There is low accuracy gene - only the coding a specific phenotype because you have to regions sequence all the DNA so the number of times you can sequence the same region over and over is very low (low coverage) Sequencing (NGS) Genotyping Can type up to all 3 billion bp in the Up to ~5 million bp (usually 5-600k) as genome you are targeting a specific region Non-targeted or targeted Targeted only (need to know what you ○ Gene panel - used when are looking for) sequencing is targeted for a ○ Single variant particular set of genes ○ Small scale – medium scale ○ Exome (dozens to thousands of ○ Whole genome variants) ○ Large scale (thousands to Slow(er) – 1 day to 2/3 weeks millions of variants) Can provide ’complete’ information Fast – hours to a few days ○ Known and novel variants Provides partial information Varied cost but relatively expensive ○ Only known/ common variants Pipelines to analyse data still Varied cost but usually cheap developing S Pipelines well established E.g. microarrays Why still genotype? NGS can provide more complete information, so why not just use it for all genetic tests? ○ Time ○ Cost → NGS is getting cheaper but is still too expensive ○ New technology adoption ○ Incidental findings → could have issues present that you are not looking for thus ethical complication Applications of genotyping in humans: ○ Identification of known pathogenic variants ○ Use in mapping of genetic conditions → rare diseases need to be mapped out to verify where the disease locus is occuring ○ Study of evolution and anthropology EXPERIMENTAL TECHNIQUES FOR DETECTION OF GENETIC VARIANTS USING GENOTYPING, INCLUDING RFLPS, ASOS AND PCR AMPLIFICATION BASED TESTS Restriction Fragment Length Polymorphisms (RFLPs) Small variants may change the recognition site for a restriction enzyme by chance → produces an RFLP However, we cannot see an RFLP on genomic DNA because we have lots and lots of restriction sites all over the genome - therefore there are so many bands that you just see a smear all the way down the gel. HindIII cuts at the locus, but the variant has changed C → G thus HindIII no longer cuts at the sequence thus two DNA morphs with HindIII Note: can’t see the RFLP among genomic DNA due to the many restriction sites on the genome, thus so many bands it presents as a smear all the way down the gel Goal: figure out a way of specifically looking at a region of interest. There are two options to identify RFLPs using PCR amplification; OPTION 1: IDENTIFYING RFLPs USING PCR AMPLIFICATION From the genome, we can amplify the region that contains the cut site if the restriction site was there After amplification, one will have the restriction site and the other one The strand with the cut site will be cut with the respective restriction enzyme and the fragments will be run on a gel Those that are homozygous for the morph with the restriction enzyme will have two strands on the gel Those that are homozygous for the morph without the restriction enzyme will only have one large strand on the gel Those that are heterozygous will have one copy of each - therefore one chromosome will produce two strands while the other will only have one Therefore this models the different variances and type them as either homozygous or heterozygous OPTION 2: IDENTIFYING RFLPs USING LABELLED PROBES Digest all the genomic DNA and then use labelled probes that only binds to the regions of the DNA that constraints the restriction site The probe will bind to either side of the region of the cut site - this therefore can be seen in the gel If the restriction site is present, the probe will bind to either side of the cut site as well as the restriction site - resulting in two bands in the gel Probe based approach – allele specific oligonucleotide (ASO) Problems with RFLP genotyping Many small variants do not change restriction cut sites - changing a cut site is just base off random chance Not high-throughput → designing an assay where you put multiple different restriction enzymes and analyse is via PCR amplification or via probe binding and way too complicated Probe hybridisation based analyses (ASO) can identify alleles in a high-throughput way ○ Produce an oligonucleotide (short stretch of DNA) that has been labelled with a fluorescent marker ○ The probe will complement a specific target strand ○ If there is a polymorphism, the probe will not entirely bind to the strand and hence will be unstable if the temperature is raised - the probe will be lost ○ Hence, the number of probes bound will indicate which variant is present as a stronger signal/fluorescent will be seen in the variant that matches the probe ** can also work for indels Need 2 ASO probes to distinguish between three genotypes o a SNP locus with two variant alleles Result: number of probes (labelled) = stronger signal if the top variance is then compared to the bottom signal. Examples: 1. ASO detecting SNV for sickle cell anaemia: - Codon 6 in the beta globin gene where there is a reference allele and a pathogenic variant → design a probe where the single nucleotide change is in the middle. - Add the oligonucleotide for the reference allele visualise the homozygous individuals for non-pathogenic version binding to half the DNA - Homozygous for the reference allele ⇒ shows NO binding 2. ASO detecting indel for cystic fibrosis - Deletion of 508 base pairs - Result: homozygous for the particular allele on CF thus very strong bonding indicates they would not be a carrier of the mutation - Parents are shown to be heterozygous for the reference and pathogenic allele, one child unaffected and the other a carrier - Assay is used to genotype individuals and see if they are carriers ASOs can scan for multiple variants in the same region/gene There is no limit to the number of probes This can be done is one go using a microarray for multi-allele/gene analysis ○ Identification of a pathogenic variant via microarray should be confirmed by sequencing as it might not be the exact allele that you are looking for → want multiple pieces of information telling you the same thing ○ Sequencing is the gold standard to outline that a particular pathogenic variant could be present. ○ Each signal on the microarray chip will indicate a binding of a specific probe Easiest to multiplex as many different ASOs can be analysed on a single chip using just one sequence of patient DNA. Certainly could use other methods but changing a probe in each condition rather than PCR primers, restriction enzymes or requiring gel electrophoresis makes the process easy to multiplex. Polymerase chain reaction - variant detection Uses a specific primer that binds to a target region and undergoes cycles of amplification ro produce a product that is specific to that target region We can manipulate the steps of PCR to detect different variants ○ In the annealing step of PCR - we can add primers that only work if particular variants are present and they won't work if the variant is not there ○ This then has the flow through effect of either producing or not producing certain products ○ This is particularly effective for single nucleotide variants and for indels as these both has a significant impact on the annealing of the primers EXAMPLE: Allele specific PCR example – ARMS PCR Primers are used that only anneal successfully to specific variants There is a particular difference in primers at the 3’ end as this is the most important part of where the primer needs to bind as the 3’ end is there synthesis begins on the template strand ○ Therefore, no binding of primer on 3’ end = no product extension e.g. Amplification Refractory Mutation System (ARMS PCR) ○ There are outer primers which ensure that a product is always made and the PCR works ○ There is then the addition of two other primers - each one is specific for one allele ○ In the above image, if the individual is homozygous for the G primer, they will produce a strand from the outer primer as well as one with the G primer ○ If the individual is homozygous for the T primer, they will produce a strand with the outer primer and the T primer ○ If they are heterozygous, they will produce three bands; one outer, one G and one T primer strand ○ Need to be complementary for binding to occur ○ Result: different binding due to the PCR to make specific products depending on the allele Manipulating extension time The size of the produce might be different if the variant changes the size of the region The manipulation of the extension step of PCR utilising Micro and Mini satellites Micro- and Mini- satellite loci There are many different short DNA sequences that repeat in tandem across the genome Highly polymorphic ○ Loci have variant number of repeats between individuals ○ The chromosomes of an individual usually have variable repeats at the loci (heterozygotes) Microsatellites aka Short Tandem Repeats (STRs) – repeats of 2-10 bp Mini-satellites aka Variable Number of Tandem Repeats (VNTR) – repeats of 10 to 100 bp Used in fingerprint DNA sequencing as they have the best separation between different individuals based on the large variation within the population → likely to be heterozygous at different locations PCR amplification of tandem repeat loci To identify/detect micro/mini satellites - it is dependent on the number of repeats, the more repeats you have, the larger the product is actually going to end up being EXAMPLE An individual has two alleles on their chromosome, one with 4 tandem repeats and the other with 8 The same set of primers is then designed which will bind to either side of the short tandem repeats The 4 tandem repeat will be much smaller and then migrate much faster in the gel using gel electrophoresis Therefore, the synthesis of the tandem repeats using the primers and then analysing the distance of migrations indicates their fragment size and therefore the particular genotype ○ Variable in population = differentiation of individuals based on the size of fragments Example: identifying region linked to dominant allele - Different molecular markers for the same region will amplify up the marker regions - Pathogenic variants within the marker is a linkage with the larger marker WEEK 4 MOLECULAR MAPPING AND DNA PROFILING Micro-/Mini-satellite loci make for excellent molecular markers Highly variable - for many other types of molecular differences, you’re going to get very few number of alleles Easily detectable using gel electrophoresis via: 1. RFLP and probe based analysis (e.g. Southern hybridisation) - using the repeat sequence as a probe (VNTRs) 2. PCR amplification - using sequences on each side of the repeat as primers (STRs) Result: ○ Different size bands is specific for a different allele ○ Genotype people based on the alleles they have on their chromosomes ○ If an individual is a homozygote for one marker = only see one band GENOTYPING AND PEDIGREE ANALYSIS TO DETERMINE THE PHASE OF & MAP DISTANCES BETWEEN, MOLECULAR LOCI How to calculate map distances using molecular markers In Drosophila we analyse two DNA marker loci ○ Loci A with variable number of tandem repeats ○ Loci B with variable number of tandem repeats There is not an even distribution of genes which suggests that the genes that are inherited are linked More specifically, the higher proportion of A1 B1 suggests that these two alleles are linked. Similarly, A2 and B2 are linked (due to the higher proportion) Mapping of DNA markers using genotyping and pedigree - IF THE PARENTAL GENOTYPES ARE KNOWN Example: There are three linked DNA marker loci (A, B and C) each with several alleles. From the triple heterozygote I1, determine the map distance between the three genes. The easier way to determine which phase the alleles are on - you look at the parents of the individual you’re looking for recombination in and see which one each of the allele was inherited from In this example, A1, B1 and C1 must come from the father as the mother does not possess these genes - therefore, these alleles are located on one/the same chromosome A2, B2 and C2 must come from the mother as the father does not have these genes. Therefore, these genes must be located on one/the same chromosome The genotype for closely linked genes or markers on a single chromosome or gamete is called its haplotype (haploid genotype). Once we know the phases of the parents, we can look at the offspring; ○ Parental and recombinant gametes can then be determined by genotyping the offspring of I1 ○ Some offspring have inherited the full haplotype while some individuals have inherited a recombinant arrangement ○ Of all the offspring, only offspring II3 shows a recombinant of A and B alleles - that is ⅛ of the offspring Therefore the map distance between A and B is 12.5 (⅛ x 100) ○ Of all the offspring, individuals II5 and II8 show a recombinant between C and B - therefore the distance is 25 map units (2/8 x 100) Therefore, the map of the genes would be Mapping of DNA markers using genotyping and pedigree - IF THE PARENTAL GENOTYPES ARE NOT KNOWN If parental genotypes are unknown - parental combinations can be assumed from offspring ○ You assumes that the most common arrangement is the parental combination and that the recombinant is the least common arrangement In the example, lets focus on the A and B genes;; ○ The possible combinations of A and B are A1 B1, A2 B2, A1 B2 and A2 B1 ○ The next step is to count the number of each combination in the alleles We would therefore assign A1 B1 and A2 B2 to be the parent combination as there are the most common combinations These are the following results for A and C/ C and B OUTLINE MOLECULAR LOCI GENOTYPING AND MAPPING IS USED IN APPLICATIONS LIKE GENETIC COUNSELLING OR AGRICULTURAL ASSISTED BREEDING Molecular Mapping with DNA markers vs. Genes DNA markers provide major advantages over genes for mapping Genes DNA SNPs DNA Satellites Number detected Fewer Very Many Many Ease of Scoring Difficult Moderate Easy Number of ‘alleles’ Few Up to 4 Many Level of Low high High polymorphism Why use molecular loci to map chromosomes? Traditional uses: ○ Provide ‘high resolution’ genetic maps of organisms ○ High resolution maps have >1 marker per map unit ○ Identify and clone genes using their map position ○ Determine if mutations affect different genes ○ Can map better mutant phenotypes based on which molecular markers are seen and inherited ○ Use the information to identify whether a particular molecular marker is being inherited vs. another molecular marker (looking at two mutant phenotypes) as this tells us whether it is located on the same or different genes Continuing uses: ○ Used to identify rare disease causing alleles ○ Allows tagging of desired alleles in plant/ animal breeding ○ Linkage with genetic markers can be considered in genetic counselling/ risk calculations ○ Assist in genome sequence assembly ○ Linkage of genetic markers → gives more information regarding higher or lower risk that an individual could endure ○ Genome sequence assembly → putting contis together due to known linked-markers EXAMPLE: Marker Assisted Breeding Nearby markers can be used as a ‘tag’ for a desirable allele/ trait e.g. there is an allele of the ‘Bold’ gene, b2 , that is desirable in plants but is difficult to detect and only appears late in development The Bold locus is in strong linkage with the A marker to one side, and the C marker to the other - this form haplotypes ○ The b2 allele seems to be in linkage with the A1 marker and the C4 marker - which we have determined by screening other plants ○ Instead of waiting for the trait to appear later on in development, we can genotype the A and C locus and only keep any plants that have the A1 and C4 because if they are in close linkage with the bald gene, then any that have A1 and C4 should also inherit on the same chromosome You genotype and select seedlings that only contain A1 and C4 ○ b 2 should follow EXAMPLE - USING MARKER TO COUNCIL ON GENETIC RISK This approach involves tracking a pathogenic variant and then assigning genetic risk to individuals who haven't developed the condition yet e.g. Autosomal dominant condition – Huntington’s Disease (HD). The gene involved in HD known to be closely linked to DNA marker ‘A’ Individual II2 inherited A4 and Huntington's disease from their mother. Therefore, we can infer than the A4 gene is in close proximity/in linkage with the pathogenic variant for huntington’s disease What is the chance that III1 has inherited the pathogenic HD allele? Without genotypic information = 50% chance Given they carry the A4 allele = 100% as we know that the A4 gene is linked with HD Chance of recombination where h will recombine with H and hence the offspring will inherit the A4 gene but with the non-pathogenic h Chance/probability that III1 will have HD is = 100 - the chance of recombination - we will use map distance for this Assume A and HD gene are 20 mu apart Chance with genotypic information = 100 – 20 = 80% chance EXAMPLE QUESTION; Why might you want to genotype markers in close linkage either side of a desired trait, rather than just one on one side? If you genotype only one side while it is likely you will inherit the desired trait, there is always a chance of recombination between the marker and the gene of interest. This could cause your allele which markers the desired trait to recombine with the other chromosome which does not contain the desired allele Genotyping both sides means there is an increase in confidence that the desired allele present is actually apparent, as the alleles on either side and the desired allele should form a haplotype. Only chance of you not getting the desired trait is due to a double recombination on either side of the desired allele which is a rare occurrence. What are the ongoing applications of mapping and linkage analysis with molecular markers: Genotyping individuals to locate rare pathogenic genetic variants Not to produce high resolution genetic maps as second and third generation sequencing has supplanted the need to build genetic maps based on linkage to genetic markers Tagging desired alleles in animal breeding COMPARE AND CONTRAST THE TECHNIQUES OF VNTR DNA FINGERPRINTING WITH STR DNA PROFILING, OUTLINING THEIR STRENGTHS AND/OR LIMITATIONS The basis of DNA profiling A number of situations require us to determine either the: ○ Identify of someone from the population or in comparison to another sample ○ Identify family relationships between individuals Genomic DNA provides a fantastic tool in these situations because: ○ Genomic DNA is stable during life and the same DNA is found in all cells of the body - the DNA you are born with stays the same till the day you die ○ Excluding rare mutations ○ Each person’s genomic DNA is unique due to diversity of certain genomic regions ○ Related individuals share related DNA sequences via ancestry Molecular methods are best suited to identify differences in genomic DNA between humans ○ Few differences can be observed at the phenotypic level ○ However, majority of the variations are seen at the genomic level - Need to look in non-coding regions VNTR based ‘fingerprinting’ The first major method of genome profiling analysis Genomic DNA is processed to build a ’unique’ banding pattern based on mini-satellite loci (VNTRs) Produces a ‘fingerprint’ of bands - it is unique to every individual ○ The bands are creating by cleaving the DNA and their restriction sites and use probes to identify them in the gel If higher number of loci used = increased probability pattern will be unique Used less now PROS AND CONS Pros ○ Historically used to convict/exonerate individuals with regards to crime Cons: ○ Require large amounts of DNA ○ DNA must be non-degraded - could give incorrect band lengths ○ Can be hard to interpret with high certainty ○ Are similar bands really the same allele from the same locus? STR DNA Profiling Method still in use today Use different and unlinked microsatellite loci (STRs) ○ small and highly variable in copy number (highly polymorphic) PCR is used to amplify one locus at a time to identify different repeats ○ Alleles define unambiguously by the size of fragment produced ( based on number of repeats) ○ Produces a ‘genotype’ at each locus ○ Genotypes of multiple loci built into a unique ‘DNA profile PROS AND CONS Pros ○ PCR amplification extremely sensitive, only small amounts of DNA needed ○ As small regions needed, works on highly degraded DNA ○ No ambiguity around alleles Cons: ○ Small amounts of contaminating DNA will easily amplify Multiplexing in PCr based STR profiling Many different PCRs in single reaction ○ Markers with difference in length separated by size by gel electrophoresis ○ Markers with similar size identified based on different dyes incorporated into primers ○ The dyes are detected using a laser ○ If there are two separate peaks = the individual is heterozygous for those two band lengths at that particular locus ○ If there is one big band - suggests that the individual is homozygous for that band length at that particularly locus Automated detection using capillary based electrophoresis instead of bands Why not profile SNPs/SNVs SNPs are highly abundant across the genome The lack of number of alleles at each loci (up to 4: A, T, G, C, commonly 2) mean many more loci need to be tested to build a unique profile ○ Because SNPS are only a change in the base, to be more accurate in DNA profiles there needs to be more assays Still has applications for highly degraded DNA, and in studies of lineage and evolution (Y chromosome DNA or mitochondrial DNA) DNA profiles are interpreted using probability Probability based on population frequency of alleles is used to determine ’uniqueness’ of profile Each locus is independant (unlinked) so probability is multiplied for each locus APPLICATIONS OF DNA PROFILING AND HOW POPULATION GENETICS NEEDS TO BE CONSIDERED WHEN INTERPRETING DNA PROFILES Forensic Applications - Identifying/ruling out suspects Commonly used to compare crime scenes/ crime samples with suspects to rule out any suspects that don't match data collected from the crime scene ○ From the example (on the right), the top two STR results represent the suspects’ sperm and the victim’s epithelial cell fraction ○ As suspected, the sample from the victim and the epithelial cell fraction should match ○ The sperm fraction matches the one from the suspect - hence they cannot be ruled out. However, this does not entirely confirmed that they are the one who committed the crime Familial matching can be used ○ e.g. Convicted serial killer Lonnie Franklin aka ’The Grim Sleeper’ and his son’s DNA ○ The son was arrested for other crimes but his DNA was found to match the one of the ‘Grim Sleeper’ - hence they suspected the father to have committed the crimes as the son was too young at the time. Personal genomics websites have been used to identify suspects in crimes ○ Convicted serial killer Joseph DeAngelo aka ’The Golden State Killer’ and genealogy database GED match Forensic Applications - Identifying Human Remains After crimes or large scale disasters, identification of human remains may be incredibly difficult by other methods ○ As only small amounts of DNA is required, the only viable method may be DNA profiling People can be identified directly if an existing sample of cells or DNA profile is available DNA profile can also be compared to family members of a missing person ○ e.g. A large portion of the victims of the Sep 11, 2001 World Trade Attacks were identified by DNA profiling alone, based on comparison to relatives or hair/blood samples Investigation of Relatedness or Paternity Legal disputes over paternity Approval of immigration based on relatedness Theft or unlawful use of agricultural breeding stocks - people may have obtained breeding stocks without payment/stolen Confirming pedigree of pets or livestock Tracking GMO crops Determining source of poached wildlife Analysing environmental populations ○ Analyse Inbreeding, Diversity Legal Considerations of DNA profiling - excluding vs Proving Exclusion of identity or relatedness is simple ○ If two DNA profiles are different, they cannot have come from the same person ○ DNA profiling has been instrumental in establishing innocence of those suspected or previously convicted of a crime Complete proof of identity or relatedness is impossible ○ If two DNA profiles match, it can’t be concluded they came from the same person It is possible for two people to have the same genetic material due to many other factors such as random inheritance/mutations etc ○ Instead, a probability is provided that the DNA profile in question came about by chance It’s used as just one line of evidence, among other pieces of evidence The chance that this individual is the suspect is …… False inclusion of individuals ○ Relatives more likely to share alleles ○ Monozygotic twins – 100%, Parent/Child – 50%, Siblings: 40-60% (Average 50%) ○ Some alleles more common in certain populations / ethnic backgrounds False exclusion of individuals ○ Technical problems with poor quality or low amounts of DNA ○ Contamination or a mixed source ○ Human error Examples: 1. An individual has the following genotype at three closely linked DNA markers: A1 A2; B1 B2; C1 C2. Their mother has the genotype A2 A4; B2 B4; C1 C4 and their father has the genotype A1 A3; B1 B3; C2 C3. Which one of the following shows this individual’s haplotype? 2. You analyse a family with 12 children. You genotype two loci in the parents and the offspring. The genotypes of the parents are: P1: E2 E4, F3 F5 and P2: E1 E3, F1 F2. You decide to use the markers from P1 to determine the distance between the two markers. You look at the alleles inherited by the offspring from P1 and count up the combinations. You find the following: E2 & F3: 4 children, E2 & F5: 2 children, E4 & F3: 0 children, E4 & F5: 6 children. Based on these counts, the distance between markers E and F is (give your answer to one decimal place): MODEL ORGANISMS FOR STUDYING HUMAN GENETICS OUTLINE THE ASPECTS OF EUKARYOTIC EVOLUTION THAT MAKE MODEL ORGANISMS USEFUL TO STUDY HUMAN BIOLOGY Phylogeny the understanding and interpretation of evolutionary relationships between organisms. It is represented in a Phylogeny tree where individuals that are more related to each other will share a more recent common ancestor in art All organisms originate from one common ancestor called the universal common ancestor How related two organisms are depends on how recent their common ancestors are Phylogeny between genes Orthologues - homologues (genes/proteins) found in different specific derived from a common gene ○ Most likely appear in organisms with a recent common ancestor ○ Through evolution, different mutations occur to the gene which separate them in different organisms but they maintain a similar functions ○ If we wanted to study the gene, we can study the different organisms and conduct genetic changes to the gene to see what effect occurs Paralogues - homologous, often found within the same species, derived from gene duplication ○ Genes that have undergone a gene duplication in an ancestor to make multiple versions in effect of that same gene Human vs S. cerevisiae Eukaryotic - yeast Similar organelle composition and cellular metabolism Study of eukaryotic gene expression Translation, transcription, post-transcriptional modification, gene regulation Eukaryotic cell cycle conserved Humans vs C. elegans and D. melanogaster All bilateral animals - left and right symmetry All have Blastula formation - useful for studying germ layer fate ○ Three germ layers Motile Basic muscle, nerve and gastrointestinal similarities Many aspects of innate immune system shared ○ Have phagocytes that will recognise pathogens and engulf them - help studying immune system Cell-cell signalling pathways conserved ○ in drosophila - have a more sophisticated body structure and GIT system - good for study olfactory and smell systems ○ In C. elegans - much more simpler body - good for cell signalling Both drosophila and C.elegans fall into a class of organisms called Protostomes Which form a different structure in the blastula - when the embryo forms it undergoes a round of division and forms these cleavage patterns ○ In protostomes, they undergo a spirals=cleavage which forms a blastopore ○ The blastopore ends up developing into the mouth And the second opening end up being the anus ○ All other deuterostomes undergo a radial cleavage which the blastopore becomes the anus and the second opening becomes the mouth Humans vs D. rerio - zebrafish Both vertebrates with bone based skeletons ○ All other bilateral vertebrates have cartilaginous vertebrates Nervous system (brain, nerve pathways) show similarities Share an adaptive immune system Respiratory/circulatory system show similarities – closed circulation, gills and 2 chambered heart Have similar organ systems to study: liver, kidneys Humans vs. X. laevis Both tetrapods – limb and digit development Also partially terrestrial ○ Respiratory/circulatory system – lungs, diaphragm, 3 chambered heart Have adaptive immune response Humans vs M. musculus - mouse Both placental mammals – highly similar embryonic development and reproduction Body systems most similar compared to other organisms, including ○ Adaptive immune system ○ Nervous system Metabolism highly similar ○ Endothermic, Urea excretion Circulatory system– lungs, 4 chambered heart JUSTIFY CHARACTERISTICS THAT MAKE ORGANISMS SUITABLE TO BE MODELS FOR GENETIC STUDIES Considerations for using model organisms Is the biological question applicable to the organism - e.g if you want to study skeletal diseases, only look for organisms, such as zebrafish, that may be able to provide the basis for you to study that respective area of research Is it feasible ? ○ Money - cost of organism, maintenance and storage ○ Time - generation time (how long it takes for a generation to occur - how long will the experiment take?) and number of progeny ○ Measurement of phenotype or assay e.g. if studying embryonic development, which is easier Mouse or zebrafish? Mouse have an internal embryonic development in that their eggs brow inside their bodies Zebrafish have an external embryonic development in that their eggs are outside + they are transparent Therefore Zebrafish are the better organism e.g. if you want to study neuronal development using fluorescent markers, which is easier C. elegans or D. melanogaster? C. elegans have a completely transparent bodie - hence they would be the better organism Is it ethical? ○ Can you answer your experimental question without the use of whole organisms? Can they be answered through; Biochemical assays Cell culture Organoids ○ Ethics approval is needed in Australia for Vertebrates, Cephalopods and Decapods ○ An animal ethics committee will review and monitor your use of animal models with a focus on: Reducing numbers of animals or reducing harm Ensuring alternatives for the research have been considered Ensuring ongoing compliance with ethical treatment and care of the animals What are the genetics, has the genome been sequenced? ○ All genomes have been sequenced for these model organisms, with unique genome database for most models available Important to identify if homologs to the human gene are found in the organism - even if they have a similar structure or developmental pathway, you need to make sure the same genetic pathway is present for you to study Work is now focusing on better characterising the pan-genome of these organisms ○ What gene editing/ analysis tools have been previously developed? ○ Some unique considerations regarding the genetics/genetic modification of model organisms: S. cerevisiae have a dominant haploid stage - makes genetic modification easier as you can target the haploid stage to knock out any genes of interest they have a double cell which makes genetic modification easier as you can target the haploid stage of the organism, thus if you want to knock out a gene you only need to knock out one copy of the gene using your GM technique and then make more copies of itself by budding thus no need for sexual reproduction Traditionally, some organisms more difficult to undertake genome editing in Zebrafish and C. elegans – mostly used knockdown analysis - Improving with CRISPR-Cas Many fish genomes, including zebrafish, have gone through large duplication events - Complicates analysing orthologs Fish and other amphibians are odd in the animal world as they undergo obscure genome duplication events where whole regions of chromosomes or genomes are duplicated leading to paralogues with similar function or develop changes to lead to different functions. Could be knocking out the closest homolog but not the paralog which does a similar thing that you would otherwise see in a human Xenopus laevis is tetraploid - Makes genetic modification more difficult as you have to knockout 4 genes instead of 2 in other organisms Will my research be funded? ○ Research using mouse models are likely to be funded more as they are more closely associated with humans Will the findings ultimately be translatable to human biology? EXPERIMENTAL ADVANTAGES AND DISADVANTAGES OF KEY MODEL ORGANISMS IN GENETICS * D.rerio and X. laevis have duplication events which may make it difficult to target genes as there is genetic redundancy ** drosophila have incredibly difficult storage mechanisms as they cannot be frozen - hence they rely on the continual breeding of livestock (expensive to maintain) WEEK 5 MUTAGENESIS AND GENETIC ENGINEERING STRATEGIES OUTLINE CHARACTERISTICS OF FORWARD AND REVERSE GENETIC APPROACHES Forward and Reverse Genetics Forward Reverse Involves going from a phenotype Involves isolation of a gene that may be through to a genotype found in a homologe You first identify a mutant or variant We then generate a mutant allele and phenotype in a population that is either observe the mutant phenotype a natural population or a mutagenised This will tell us the function of the population (more common) wild-type gene Once the mutant has been identified, you can then go and find where that mutation is occurring in the gene sequence You then have to go and analyse the molecular function FORWARD GENETICS General Approach to Forward Genetics Forward genetics strategies will differ depending on the organism you are interested in. However, this is a general approach to do this 1. Mutagenese germline/gametes of WT organism: Start off with your cells of interest and introduce a mutagen which will alter the DNA ○ It can introduce a single base change or it could introduce a insertion/deletion or rearrangement 2. Self fertilise or undertake crosses to produce homozygotes for mutation 3. Screen for phenotypic changes of interest - you look for the mutant phenotypes which are usually rare among of the F1/F2 population 4. Map/sequence to identify impacted gene - use either lineage mapping to narrow down the region or you can sequence the gene to isolate particular location of mutations/variations 5. Use experiments to determine molecular role and interaction of impacted gene - to find out function and role of the gene in the organism Mutagenic Agents and Phenotype Screening Mutants may only appear under certain conditions - the design of phenotypic screening is very important Complexity and expense of mutagenesis and screening increases with ‘complexity’ or organism. There are 3 main classes of mutagenic agents: chemical, radiation and insertional **Note: N-nitroso-N-ethylurea (ENU) used more commonly for vertebrate model organism (mouse, zebrafish, Xenopus) - chemical mutagenic agent Insertional Mutagenesis by transposons Overall process similar to that outlined Delivery method of DNA depends on the organism Certain transposon mutagenesis systems more successful in certain organisms Major advantages to chemical/X-ray mutagenesis ○ Mutants can be labelled/selected for via the inserted of a resistant gene ○ Inserted gene easily identified PROCESS: 1. A transposons with a selectable marker is inserted - the marker must be visible in the gene in the germline so that you can select for the insertion 2. A transposase is also selected so that it can recognise the specific cut sites so that the transposon can be inserted 3. Often the transposon will be insert in a recognition site Strengths and weaknesses of Forward Genetics Strengths ○ Genome-wide ○ Unbiased - the phenotype only appears through mutagenesis and therefore what you are looking for is random and is unbiased for particular genes Weaknesses ○ Mapping or screening for impacted gene can be difficult ○ Unfocused (you might be interested in particular genes) REVERSE GENETICS Reverse genetic strategies are those that start off with a gene of interest and you either produce a genotype or manipulate the gene produce a phenotype that would instruct you on the function of that gene or the process that the genes are involved in Gene manipulation can referrer to Recombinant DNA technology methods that either: ○ Mutate the gene to alter its function ○ Modify the expression of the gene ○ Introduce the gene to another cell/organism Reverse genetics requires introduction of nucleic acids to cells ○ To alter gene expression or structure, nucleic acids involved in the system need to be introduced ○ Introduction may be transient – exists within or for a limited number of generations ○ Introduction may be permanent - integration with the cell’s DNA via recombination Usually referred to as a transgene – producing a transgenic organism Viral Methods Non-Viral Methods Viruses take DNA or RNA, package it PHYSICAL up into a little protein coat and deliver it Shoot/inject the cell with the DNA from cell to cell We can make recombinant viruses that CHEMICAL can transfect cells and bring the DNA Change the chemical nature of the cell we want into the cell membrane which allows the DNA to pass through The cell membrane is negatively charged - polar DNA is unable to pass through PHYSIOCHEMICAL Pack up the DNA into lipid packages which will then undergo phagocytosis/endocytosis TRANSIENT GENE KNOCKDOWN APPROACHES (RNAI, MORPHOLINOS) WITH EXAMPLE GENE KNOCKOUT APPROACHES (HOMOLOGOUS RECOMBINATION, CRISPR-CAS) Gene Knockdowns using RNAi We may target a gene by reducing the expression of its mRNA product Borrow the RNA interference (RNAi) pathway of eukaryotes for this ○ The pathway is typically used in animal cells for defence from viruses ○ Drosha will cleave the pri-miRNA into Pre-miRNA that then gets recognised by a protein called dicer ○ Dicer will cleave the pre-miRNA into a form that can form a complex with RISC and AGO ○ This complex will then go and bind with the RNA transcript ○ If miRNA is not shown to have full complementarity - translation is inhibited as the ribosome cannot bind to it. This leads to a reduced expression of the product ○ If there is full complementarity, it will follow the same pathway way siRNA where the RISC and AGO complex binds completely and degrades mRNA Referred to as a ‘knockdown’ as the level of expression reduction is variable - it is not a ‘knockout’ as there is still expression of the product it is just reduced ○ Never abolished 100% Insertion of the RNA can be performed transiently ○ Double stranded RNA can be infected or inserted via a virus packaged up with the RNA - dsRNA introduced to cells/organism Can be performed permanently ○ dsRNA ‘gene’ introduced as transgene RNAi Knockdown - Strengths and Weaknesses Strengths Weaknesses Provides special and temporal control Not applicable to some eukaryotes via administration ○ S. cerevisiae lacks system, poor Can target all genes in a genome knockdown in fishes and Works in many eukaryotic organisms amphibians inc. human cells Variable knockdown efficiency ○ Particularly potent in C. elegans ○ Rarely 100% because they have an RNA Off-target effects - even if we make the dependent RNA Polymerase Rna constructs as complementary as that keeps amplifying the possible to the RNA target, they might dsRNA bind elsewhere and therefore may cause ○ Therefore, the genes can be downregulation of other genes passed onto offspring and be seen in their phenotype Variable knockdown efficiency - therefore can control the knowndown so that the organism itself does not die Gene Knockouts via Homologous recombination To ‘knockout’ a gene is to abolish its function completely Homologous recombination can be used to ‘swap’ a functional copy with a ‘knockout’ version Knockout organisms display the absence of gene function Note: complete gene knockouts are often lethal as most genes are quite important – require alternative methods of mutagenesis or conditional knockouts PROCESS 1. Design a target infector that is going to allow for the homologous recombination to swap out 2. A large portion of the target gene is removed 3. A resistant marker is involved for it to be selectable 4. The edges of the side of the target gene are homologous to either side of the knockout gene - this will allow for the homologous recombination to occur either side, thereby swapping our knockout gene with the target gene 5. We then introduce the vector into embryonic stem cells so that we can grow the cells in culture and select the cells that have undergone recombination using the selectable marker that was introduced earlier 6. After selection, the cells that have undergone recombination can be put into the embryo (inner cell mass of blastocyst) of a mouse 7. Alternatively, we can also directly inject the vector into a fertilised egg ○ As the fertilised egg represents a single (eventually diploid) nucleus, any change to those chromosomes will be inherited by the entire organism it develops into. ○ Therefore, if gene knockout occurs, the organism will represent a heterozygote for the gene knockout. ○ This skips the generation of chimeric mice seen when using ES cells and blastocysts, leading to the production of a knockout line more quickly. Positive and negative selection to avoid random integration The positive marker can insert anywhere randomly into the genome Therefore, we can use both positive and negative selection to avoid any cells that undergo random integration The positive marker should be located in the region of the gene that we want to be integrated in as we want to make sure that this region has been inserted correctly (in the vector?) A negative marker is then placed on the outside of the gene targeting region which is the region that undergoes homologous recombination ○ A negative marker is a marker than we can select against If random integration or single recombination occurs, then it is also going to include the negative selectable marker - we can then expose the cells to a compound that will cause the cell to die if that have taken up the negative selectable marker leaving only cells have have undergone double recombination Knock-in organisms display novel gene function/expression Rather than knocking out a gene, you can ‘knock-in’: ○ An entirely new gene ○ A novel component to an existing gene – novel regulatory element or tag ○ A change to one or more nucleotides Allows you to study the fate of certain genes (e.g GFP in mice) Allows you to study the effect of overexpression of a certain gene and compare it to humans Homologous recombination - Strengths and Weaknesses Strengths Weaknesses Directly targets genes via homology Based on occurrence of random event ○ Any gene in the genome ○ Screening and selection Highly efficient in certain organisms methods aid this ○ e.g. S. cerevisiae and M. Dependant on rates of homologous musculus recombination in organisms Can be used to ‘knock-in’ ○ In most higher order eukaryotes Relatively cheap rate of successful recombination is low Off target effects Reverse genetics: CRISPR-Cas Apart from certain animals (e.g. mice), site directed mutagenesis in animals and plants was difficult/ time consuming ○ Most reverse genetics studies relied on knockdown experiments or curated random mutant libraries A discovery during fundamental biological research changed this Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) regions identified as coding an ‘adaptive immune system for bacteria’ ○ Spacer elements complementary to phage or plasmid DNA ○ Transcribed spacers and repeats processed into crRNAs which are made up of the repeat region that forms the structure and the unique spacer sequence ○ Upstream of the sequence is the tracrRNA ○ Downstream from the tracrRNA is the Cas operon which contains a range of genes ○ crRNA interacts with tracrRNA through complementary pairing to create a structure which allows for CAS proteins to interact as well to form a complex ○ This complex interacts with DNA complementary to crRNA, causing double strand cleavage meaning that the phage or plasmid can no longer replicate in the cell The properties of CRISPR allows genomes to be modified and can be used as a genome editing tool with a few modifications in order for it to be streamlined cRNA (which is the sequence that is specific to a region in the genome) is combined with the trancrRNA to form something called a guide RNA - this is done under the expression of a promoter of interest Within the CAS operons, the cas9 protein is a key player in undertaking cleavage - therefore all that is require to edit the genome or to target and cut a region of DNA is a single guide RNA and a single cas9 protein ○ If you have designed the guide RNA to be complementary to a region of DNA then it is going to form and cause a cleavage of the site upstream of that complementary ○ Once cleaved, the genome can be modified PROCESS: The PAM sequence is a particular region that needs to be present in our construct in order for the cleavage of Cas9 to work The Cas9 recognises as part of this complex with the guide RNA (sgRNA) and cuts either strand of the double stranded DNA ○ The sgRNA provides the CRISPR-Cas9 system target specificity ensuring only the correct genomic location is edited If both strands are cut and left alone, there would be non-homologous end joining which will repair the break using random base pairs to fill in the region that was cleaved - this can lead to two different outcomes 1. It might insert something 2. It may just delete that region Both outcomes will likely disrupt the gene as it will most likely cause a frameshift and cause a nonsense product to be made Another approach is to provide a donor DNA to act as the homologous sequence to allow homology directed repair to occur The Donor DNA acts as a template and therefore we can ‘fix’ or chance any base in a sequence when repairing it once it has been cleaved by cas9 Strengths and weaknesses of CRISPR-Cas Strengths Weaknesses Efficiently target any gene as it is Intended alteration may not be achieved dependent of the protein that you have ○ Require screening method for provided mutants Quick and relatively cheap Cleavage displays infidelity ○ Same system to target any gene, Off target effects – design of guide just swap the guide RNA RNA assists this Works in all organisms tested ○ Expanded potential model organisms Can be modified for novel applications ○ Designed to ‘nick’ only single strand ○ Modified to alter expression PROVIDE EXAMPLES OF CONDITIONAL GENE MANIPULATION TECHNIQUES AND THEIR BENEFITS The need for conditional gene manipulation Producing germline constitutive mutations or transgenes may not let you answer your biological question ○ If a gene is required in early embryonic development, a knockout may be embryo lethal ○ The knockout in some tissues may be lethal - therefore discoveries of some genes may not be uncovered if it died within another tissue ○ Over-expression or ectopic expression of a gene in early development may be lethal Therefore, you may want your mutation or transgene expression to only occur: ○ In certain cells or tissues ○ At a certain developmental stage ○ A combination of both Conditional genetic systems can be introduced into organisms as transgenes via random insertion, homologous recombination or CRISPR-Cas ○ e.g. Gal4-UAS ○ e.g. Cre-Lox The GAL4/UAS system A system that drives gene expression of a transcript of interest Placing GAL4 expression under a promoter makes this conditional ○ The GAL4 is placed under the control of a promoter but the promoter is also under the control of an enhancer ○ The enhancer can be swapped out for whatever enhancer we are interested in ○ E.g, we can make the enhancer to only be expressed in a certain area of the body - hence, the protein would only be expressed in that specific body tissue and all other body tissues would not express GAL4 ○ Promoter can be cell or tissue specific The other component of the system is a gene of interested or a transgene that is under the expression of the UAS promoter that is bound by the GAL4 protein ○ Only when the GAL4 protein binds that we get expression or transcription of the gene of interest For further control: Temperature or inducer responsive elements provide further control of expression ○ This is done by using GAL80+ ○ At 18 degrees, GAL80+ binds to GAL4 preventing the gene of interest from being expressed ○ At 29 degrees, GAL80+ can no longer bind to GAL4 meaning that the gene of interest is expressed ○ We can also use an inducible form of GAl4 that react to a particular molecule of interest (e.g a hormone) The GAL4/UAS system - targeted knockdown/knockout Knockdown Knockout There is a similar setup that is similar in The approach requires two constructs: that the enhancer is going to be tissue out GAL4 under the control of the specific and GAL4 will still be under enhancer, the guide RNA and the UAS the control of that enhancer which is now controlling the CAS9 If GAL4 is expressed, it will bind to Therefore, only in particular cells where UAS GAL4 is expressed, it will lead to UAS UAS in the knockdown approach will expression and hence Cas9 expression produce double-stranded RNA which which will interact with the guide RN will then trigger the RNA-interference and create double stranded break pathway leading to the degradation of leading to non-homologous end joining target RNA - hence leading to the or homologous recombination allowing knockdown of that gene specific gene editing within the cells The Cre-lox system - conditional knockout in mice A system that conditionally excises genetic material based on the expression of a recombinase If it exercises a material, it could potentially exercise a gene of interest in the cells that you want the gene to be knocked out PROCESS 1. We need to make a loxed mice - there is a gene of interest which is surrounded by lox-p regions which are repeat regions which are in the same orientation on each end of the gene of interest a. This forms a loop formation 2. In the other mouse, there is a recombinant gene called the cre-recombinase which is under the control of a particular enhancer 3. The two mice then mate to create F1 progeny that will be a heterozygote that contain both the lox-p and cre-recombinase gene 4. In the cells that expressed cre-recombinase combine the lox-p sites into one causing the gene of interest to be looped out - hence this creates a knockout heterozygous for a gene in the liver 5. The cre-lox system can be induced by adding an inducible element to Cre provides further spatial and temporal control of knockouts a. We do this by creating a version of Cre that has a specific receptor. It is therefore still going to express the Cre but if it is not provided with its agonist, it won't be functional and therefore the wildtype gene will still be present in the cell b. If an agonist is present, it can bind and therefore Cre can not bind to the Lox-p sites and exercise the target gene. Result, now rather than being only expressed in certain tissues, it is in the certain tissues when we add the agonist at a particular time Week 6: COMPLEX DISEASES I KNOW THE DIFFERENCE BETWEEN MONOGENIC, POLYGENIC AND COMPLEX PHENOTYPES AND DISEASES Monogenic phenotypes/Polygenic Phenotypes Mendelian (monogenic) diseases and phenotypes are those where there is a direct relationship between the disease gene and the disease/phenotype status Genotype and phenotype closely correlate (high penetrance) = Variants CAUSE the disease The traits presented so far are qualitative (e.g white eyes and red eyes or cystic fibrosis or no cystic fibrosis) However, the vast majority of phenotypes in humans are complex/polygenic - instead of having one gene and one phenotype, there is a combination of several genes that are going to cause a different phenotype or even a combination of the same genes that cause a different phenotype There can be pleiotropy where one gene variant can have different effects and phenotypes We can also have epistasis where one gene is masking another gene Allelic heterogeneity can also occur meaning that one allele can cause the same phenotype as another Quantitative Traits - Traits with variation showing a continuous range of phenotypes e.g., human height, weight, colour, metabolic rate, behaviour Polygenic: Varying phenotypes result from input of many genes Multifactorial/complex traits Result of a factors combination of several genes and environmental factors Complex (polygenic) diseases often show genetic predisposition, but individual genes only marginally affect disease status Genotype and phenotype poorly correlate (low penetrance) Variants PREDISPOSE to the disease - not cause it (1 disease, many genes) EXAMPLE: SKIN COLOUR HAS AN ADDITIVE EFFECT Several gene that together are going to increase our skin colour The genes also interact with environmental factors to change. Single gene versus multifactorial diseases Single gene / Mendelian (e.g., Familial Alzheimer's disease, 1-2% cases) ○ if a parent is a carrier then there is a 1 in 2 risk ○ if 1 child is already affected then the risk is still 1 in 2 ○ risk remains the same regardless of number of affected Multifactorial (e.g., Alzheimer’s disease, most cases) ○ if 1 child is affected, the recurrent risk is 1 in 25 ○ if 2 children are affected, the recurrent risk is now 1 in 12 ○ recurrent risk increases because the couple are high risk Multifactorial disorders and diseases Display familial clustering with no recognised pattern of Mendelian inheritance 1. Most common cause of congenital malformations 2. Cause of many common acquired diseases 3. More prevalent than single gene disorders 4. Harder to find the genetic factors / causes because they require much larger sample sizes to try and understand it Not all polygenic traits show continuous variation Measured in large sample and representative individuals of population Data form normal distribution: A characteristic bell-shaped curve when plotted as a frequency histogram UNDERSTAND THE CONCEPTS OF CONTINUOUS, MERISTATIC, AND THRESHOLD TRAITS Types of polygenic traits Continuous traits - e.g height, Blood pressure Meristic traits ○ Phenotype can be recorded by counting whole numbers Threshold traits ○ Polygenic and often multifactorial ○ Have a small number of discrete phenotypic classes ○ An increasing number of diseases show this pattern of polygenic/multifactorial inheritance Multiple-Gene Hypothesis Experiments by Herman Nilsson-Ehle in 1909 Cross red grain wheat to white grain wheat F1 shows intermediate pink colour Is this one gene with incomplete dominance? ○ Prediction would be F2 1 red: 2 pink: 1 white ○ But F2 shows: 15/16 variations of red (4 shades) 1/16 white ○ Cross between B allele (blue) and b allele (white) flowers gives an offspring ratio of all Bb (light blue). Explanation: There are two genes each with an additive (pigment) allele and non-additive (no pigment) allele ○ Greater number of additive alleles in genotype = more intense red colour expressed in phenotype Many genes, individually behaving in mendelian fashion. Contribute to phenotype in a cumulative/quantitative way Although, there are major assumptions: 1. A quantitative trait has continuous variation that can be quantified (measured) 2. Two or more loci scattered in the genome account for the hereditary influence on the trait in an additive way 3. Each gene locus is occupied by either an additive allele or a non additive allele 4. The contribution of each additive allele is approximately equal 5. Together, the additive alleles contributing to a single quantitative character produce substantial phenotypic variation UNDERSTAND THE MULTIPLE-GENE HYPOTHESIS, CALCULATIONS AND INTERPRETATION Calculating the number of polygenes Number of polygenes ( n) contributing to quantitative trait is estimated based on ratio of F 2 individuals resembling either of two extreme P phenotypes 1/4n = ratio of F 2 individuals expressing either extreme phenotype For low number of polygenes: (2n + 1) = number of distinct phenotypic categories observed Estimates of variation and heritability Descriptive statistics is used to understand phenotypic variation: The mean – assumes a normal distribution Variance – information about spread of data around mean Standard deviation = square root of the variance ○ 68% - within 1 standard deviation of the mean ○ 95% - within 2 standard deviations of the mean Standard error of the mean – determines accuracy of the mean (higher n = smaller standard error) UNDERSTAND WHAT HERITABILITY IS AND WHAT IT IS NOT, HOW IT IS CALCULATED (BOTH BROAD AND NARROW-SENSE), AND HOW TO INTERPRET IT The heritability of a quantitative trait A trait is heritable if some of the variation can be accounted for by genetics Heritability (H2) can be defined as: the proportion of the total phenotypic variance (VP) within a certain population that is due to genetic variance (VG) H2= VG/VP Note: human families have the same environment in common Familial = a trait shared by a family; they may not share the same genotype e.g., an adopted child speaks the same language as the rest of the family. This is not heritable, because it is not genetic. Heritable = a trait shared by people with the same genotype This makes it very difficult to study many human quantitative traits What heritability is The proportion of the total phenotypic variance (VP) within a certain population that is due to genetic variance (VG) Different in different environments What heritability is not How much of a trait is genetically determined How much of an individual’s phenotype is due to their genotype Fixed for a given trait EXAMPLE: A mean heritability estimate of 0.65 for human height does not mean that your height is 65% due to your genes, but rather that in the population sampled, on average, 65% of the overall variation in height could be explained by genotypic differences among individuals in that population. If an environmental change affects all individuals in a population equally, the mean height can change, and yet the heritability stays the same. The forgotten part of Nature vs Nurture: Gene-environment (G x E) interactions 1. Genes play an important role in quantitative traits 2. Environment plays an important role in quantitative traits 3. The interaction between genes and the environment can also play an important role in quantitative traits - however due to its complexity, it is generally ignored Broad-sense heritability H2 Measures the proportion of the variance in a population within a single generation that is due to genetic factors Gives an estimate of 0 to 1 Low heritability = variation is due mainly to environmental effects High heritability = variation is due mainly to genotypic effects Ignores genotype-by-environment interactions as it is too complicated to be solved with the formula Includes genetic values due to dominance and epistasis Not useful for breeding of livestock or crops because it considers ALL genotypic effects – some of them are not transmissible to the next generation in predictable ways (such as epistasis) Narrow-sense heritability h2 Full