DNA Fingerprinting PDF
Document Details
Uploaded by SparklingCobalt
Tags
Summary
This document discusses DNA polymorphisms and their applications in human identification. It covers various types of polymorphisms like RFLPs, STRs, and SNPs, and their uses in parentage testing, forensic science, and medical procedures. The chapter also details the use of these techniques for identifying differences in DNA sequences. The document delves into how DNA polymorphisms can be used for various applications like medical and forensic analyses.
Full Transcript
Chapter 10 DNA Polymorphisms and Human Identification Outline SINGLE-NUCLEOTIDE POLYMORPHISMS The Human Haplotype Mapping (HapMap) Project TYPES OF POLYMORPHISMS...
Chapter 10 DNA Polymorphisms and Human Identification Outline SINGLE-NUCLEOTIDE POLYMORPHISMS The Human Haplotype Mapping (HapMap) Project TYPES OF POLYMORPHISMS MITOCHONDRIAL DNA POLYMORPHISMS RFLP TYPING OTHER IDENTIFICATION METHODS Genetic Mapping With RFLPs Protein-Based Identification RFLP and Parentage Testing Epigenetic Profiles Human Identification Using RFLPs DNA Fingerprinting With RFLP STR TYPING BY PCR STR Analysis Objectives STR Nomenclature Gender Identification 10.1 Compare and contrast different types of Analysis of Test Results polymorphisms. Genotyping 10.2 Define restriction fragment length polymorphism Matching of Profiles (RFLP), and discuss how RFLPs are used in Allelic Frequencies in Paternity Testing genetic mapping, parentage testing, and human Sibling Tests identification. Y-STR 10.3 Describe short tandem repeat (STR) structure Matching With Y-STRs and nomenclature. LINKAGE ANALYSIS 10.4 Describe the use of STR in parentage testing and BONE MARROW ENGRAFTMENT TESTING USING the amelogenin locus for gender identification. DNA POLYMORPHISMS 10.5 Explain matching probabilities and the PSTR Testing contribution of allele frequencies to the certainty Post-Transplant Engraftment Testing of matching. QUALITY ASSURANCE OF TISSUE SECTIONS USING STR 10.6 Describe the use of Y-STR in forensic and lineage studies. 260 Chapter 10 DNA Polymorphisms and Human Identification 261 10.7 Use STR for linkage analysis. are more than 500,000 of these LINE-1 (L1) elements, 10.8 Give examples of the use of STR for bone making up more than 15% of the human genome. There marrow engraftment monitoring. are even more short interspersed nucleotide elements 10.9 Show how STR may be used for quality (SINEs) scattered over the genome. SINEs, 0.3 kbp in assurance of histological sections. size, are present in over 1,000,000 copies per genome. 10.10 Define single-nucleotide polymorphism (SNP), and SINEs include Alu elements, named for harboring rec- explain the potential use of SNPs in disease gene ognition sites for the AluI restriction enzyme. There are mapping. well in excess of 1 million Alu elements, accounting for 10.11 Discuss mitochondrial DNA typing. almost 11% of the human genome.1 The majority of tran- 10.12 Identify a protein profile from a reference scribed genes contain Alu elements in their introns. Alu database. elements have cryptic splice and polyadenylation sites, 10.13 Predict the effect of aging on epigenetic (DNA which can become activated through the accumulation methylation) profiles. of mutations and lead to alternative splicing of RNAs or premature termination of translation. LINEs and SINEs are also known as mobile elements or transposable ele- Polymorphisms are variations of DNA sequences that ments. They are copied and spread by recombination are shared by a certain percentage of a population. These and reverse transcription and may be responsible for sequences range from a single base pair to thousands of the formation of pseudogenes (intronless, nonfunctional base pairs. copies of active genes) throughout the human genome. Shorter blocks of repeated sequences also undergo expansion or shrinkage through generations. Examples TYPES OF POLYMORPHISMS of the latter are short tandem repeats (STRs) and vari- able-number tandem repeats (VNTRs). The probability of polymorphic DNA in humans is great SNPs, larger sequence variants, and tandem repeats due to the relatively large size of the human genome, can be detected by observing changes in the restriction 98% of which does not code for genes. At the nucleotide- map of a DNA region. Analysis of restriction fragments sequence level, it is estimated that genome sequences by Southern blot reveals restriction fragment length differ by at least one nucleotide every 1,000 to 1,500 polymorphisms (RFLPs). Particular types of polymor- bases. These single-nucleotide differences, or single- phisms, specifically SNPs, VNTRs, STRs, and RFLPs, nucleotide polymorphisms (SNPs), may occur in are routinely used in the laboratory (Table 10.1). gene-coding regions or in intergenic sequences. Polymorphisms are more frequent in some areas of the genome than in others. The human leukocyte antigen Histooricaal Higghlligghtts (HLA) locus is a familiar example of a highly polymor- phic region of human DNA. The variable nucleotide In the 1920s, scientists realized that blood type sequences in this locus code for peptides that establish (A, B, AB, or O) is inherited and could be used self-identity of the immune system. The extent of sim- for parentage testing. This limited testing could ilarity or compatibility between the immune systems of only exclude a falsely alleged father. But soon transplant recipients and potential donors can thus be after, the use of other proteins on the surface of determined by comparing DNA sequences. Some human the red blood cell (Rh, Kell, and Duffy blood sequence polymorphisms affect many base pairs. Large group systems) was introduced. The power of blocks of repeated sequences may be inverted, deleted, these serological tests was only marginally better or duplicated from one individual to another. Long than that of the ABO system. Forty years later, interspersed nucleotide elements (LINEs) are highly the polymorphic HLAs were implemented for repeated sequences, 6 to 8 kbp in length, that contain parentage and identity testing, coupled with the RNA polymerase promoters and open reading frames ABO and serological testing. related to the reverse transcriptase of retroviruses. There Largeblocksof repeated sequence Long InterspersedNucleotide elements posable tmrgh.pe Shortinterspersednucleotideelements 262 Section III Techniques in the Clinical Laboratory number of fragments generated by restriction enzyme TABLE 10.1 Types of Useful Polymorphisms digestion of DNA (Fig. 10.1). Fragment sizes vary as a and Laboratory Methods result of changes in the nucleotide sequence in or between the recognition sites of a restriction enzyme. Nucleotide Detection changes may also destroy, change, or create restriction Polymorphism Structure Method enzyme sites, altering the number of fragments. RFLP One or more nucleotide Southern The first step in using RFLPs is to construct a restric- changes that affect blot tion enzyme map of the DNA region under investigation. the size of restriction Once the restriction map is known, the number and sizes enzyme products of the restriction fragments of a test DNA region cut VNTR Repeats of 10–50 base Southern with restriction enzymes are compared with the number sequences in tandem blot, PCR and sizes of fragments expected based on the restriction map. Polymorphisms are detected by observing fragment STR Repeats of 1–10 base PCR sequences in tandem numbers and sizes different from those expected from the reference restriction map. An example of a polymor- SNP Alterations of a single Sequencing, phism in a restriction site is shown in Figure 10.2. In a nucleotide other theoretical linear piece of DNA, loss of the recognition site for the enzyme (BglII in the figure) results in alter- ation of the size and number of bands detected after gel RFLP TYPING electrophoresis. RFLP typing in humans required the use of the RFLPs were the original DNA targets used for gene Southern blot technique. DNA was cut with restriction mapping, human identification, and parentage testing. enzymes, resolved by gel electrophoresis, and blotted to The first polymorphic RFLP was described in 1980. a membrane. Probes to specific regions of DNA contain- RFLPs are observed as differences in the sizes and ing potential RFLPs were then hybridized to the DNA Normal DNA Eco RI site G T C C AG TC TAG CGAA TTCGTGGCAA A GG C T CAGGTCAGATCGCTTAAGCACCGTTTCCGA Bal I site Point mutations G T C C AG TC TAG CGAAATCG TGGC CA A G G C T C A G G T C AG A TCGCTTTA GCACCG GTTCC GA Insertions G T C CA G TC TAGCGAAGCGA A TTCGTG GCT C A A A GG C T FIGURE 10.1 Types of DNA sequence alter- CAGGTCAGATCGCTTCGCTTAAGCACCGAGTTTCCGA ations that change restriction fragment lengths. The normal sequence (top) has an EcoR1 site Duplications (GAATTC). Single-base changes (point muta- G T C C AG TC TAG CGAA TTCGTGTAG CGA ATTC GTGGC A A A tions, second line) can destroy the EcoR1 site or C A G G TC AG A TCGCTTAA GCACAT CGCTTA A G C A CC G TTT create a new restriction site, as can insertions, duplications, or deletions of any number of bases (third through fifth lines). Insertions, Fragment insertion (or deletion) duplications, and deletions between two restric- G T C C AG TC TAG CGAA TTCGTGGCA AAA AAC A A GG C TG A A TTC tion sites change fragment size without affecting CAGGTCAGATCGCTTAAGCACCGTTTTTTGTTCCGACTTAAG the restriction sites themselves. Chapter 10 DNA Polymorphisms and Human Identification 263 Bgl II Bgl II Bgl II Bgl II 1 2 1 2 A B C A B C Probe + – AG A TCT AT A TCT TC TAGA T ATA GA 1 2 Fragments visualized +/+ +/– –/+ –/– + + B + – B+C 1 2 Size Number +/+ +/– –/+ –/– – + A+B + + A, B, C 3 – – A+B+C + – A, B+C 2 – + A+B, C 2 Genotype Fragments visualized – – A+B+C 1 I II III I ++/+– B, B+C II +–/–+ A+B, B+C FIGURE 10.2 A linear piece of DNA with two polymorphic III ++/– – B, A+B+C BglII restriction enzyme sites, designated as 1 and 2, will yield different fragment sizes, depending on the presence of neither, FIGURE 10.3 Using a Southern blot to probe for RFLP. With either, or both of the restriction sites. For instance, a G to T the same region shown in Figure 10.2, only the fragments with mutation will change the sequence of the normal site (+) to one complementary sequences to a probe to the B region (top) can not recognized by the enzyme (–). The presence or absence of be visualized. The bottom panel shows a diploid genotype the polymorphic sites is evident from the number and size of where homologous chromosomes carry different RFLP alleles. the fragments after cutting the DNA with BglII (bottom right). on the membrane to determine the size of the resulting sequences. One consequence of this genetic diversity is bands. Figure 10.3 shows the pattern of bands result- that a single locus, that is, a gene or region of DNA, ing from a Southern blot analysis of the RFLP in the will have several versions, or alleles. Human beings are linear fragment from Figure 10.2. Even if the probe does diploid with two copies of every locus. In other words, not detect all of the restriction fragments, the polymor- each person has two alleles of each locus. If these alleles phisms can still be identified. are the same, the locus is homozygous; if the two alleles DNA is inherited as one haploid chromosome com- are different, the locus is heterozygous. plement from each parent. Each chromosome carries its Depending on the extent of diversity or polymorphism polymorphisms so that the offspring inherits a combi- of a locus, any two people can share the same alleles or nation of the parental polymorphisms. When visualized have different alleles. More closely related individuals as fragments that hybridize to a probe of a polymorphic are likely to share more alleles than unrelated persons. region, the band patterns represent the combination of In the examples shown in Figure 10.3, (+ +), (+ –), RFLPs inherited from each parent. Due to recombination (– +), and (– –) describe the presence (+) or absence and random assortment, each person has a unique set of (–) of BglII sites making up four alleles of the locus RFLPs, half inherited maternally and half paternally. detectable by Southern blot. In the illustration, geno- Every genotype will yield a descriptive band pattern, as types I and II both have the (+ –) allele on one chro- shown in Figure 10.3. mosome, but genotype I has (+ +), and genotype II has Over many generations, mutations, intra- and inter- (– +) on the other chromosome. This appears in the chromosomal recombination, gene conversion, and other Southern blot results as one band of equal size between genetic events have increased the diversity of DNA the two genotypes and one band that is a different size. 264 Section III Techniques in the Clinical Laboratory Two individuals can share both alleles at a single locus, but the chances of two individuals, except for identical Histooricaal Higghlligghtts twins, sharing the same alleles decrease 10-fold with Mary Claire King used RFLP to map one of the each additional locus tested.2 genes mutated in inherited breast cancer.3,4 Fol- More than 2,000 RFLP loci have been described in lowing extended families with high incidence of human DNA. The uniqueness of the collection of poly- breast and ovarian cancer, she found particular morphisms in each individual is the basis for human RFLP always present in affected family members. identification at the DNA level. Detection of RFLP by Because the location in the genome of the RFLP Southern blot made positive paternity testing and human was known (17q21), the BRCA1 gene was thereby identification possible for the first time. mapped to this position on the long arm of chro- To optimize the discriminatory capacity of RFLP mosome 17. testing, restriction enzymes that cut human DNA fre- quently were used for RFLP tests. RFLP protocols for human identification in most North American laborato- RFLP and Parentage Testing ries used the restriction enzyme HaeIII for fragmentation of genomic DNA. Many European laboratories used the In diploid organisms, chromosomal content is inherited, HinfI enzyme. All of these enzymes cut DNA frequently half from each parent. This includes the DNA polymor- enough to reveal polymorphisms in multiple locations phisms located throughout the genome. Taking advantage throughout the genome. To regulate results from inde- of the unique combination of RFLP in each individual, pendent laboratories, the Standard Reference Material one can infer a parent’s contribution of alleles to a child (SRM) DNA Profiling Standard for RFLP analysis was from the combination of alleles in the child and those of released in 1992. The SRM supplies cell pellets, genomic the other parent. The fragment sizes of an individual are DNA, gel standards, precut DNA, electrophoresis mate- a combination of those from each parent, as illustrated rials, molecular-weight markers, and certified values for in Figure 10.4. In a paternity test, the alleles or fragment final analysis. These materials, currently provided by the National Institute of Standards and Technology (NIST), Father Mother were designed to maintain the reproducibility of the Locus Locus RFLP process across laboratories. A B A B Genetic Mapping With RFLPs Parents Polymorphisms are inherited in a Mendelian fashion, and the locations of many polymorphisms in the genome are known. Therefore, polymorphisms can be used as landmarks, or markers, in the genome to determine the Locus A B location of other genes. In addition to showing clear family history or direct identification of a genetic factor, one can confirm that a disease has a genetic component Child by demonstrating a close genetic association or linkage to a known marker. Formal statistical methods are used to determine the probability that an unknown gene is located close to a known marker in the genome. The FIGURE 10.4 RFLP inheritance. Two different genetic more frequently a particular polymorphism is present regions, or loci, are shown, locus A and locus B. There are in persons with a disease phenotype, the more likely an several versions or alleles of each locus. Note that the father is affected gene is located close to the polymorphism. This heterozygous at locus A and homozygous at locus B. The is the basis for linkage mapping and one of the ways alleles in the child will be a combination of one allele from genetic components of disease are identified. each parent. Chapter 10 DNA Polymorphisms and Human Identification 265 sizes of the offspring and the mother are analyzed. The Human Identification Using RFLPs remaining fragments (the ones that do not match the mother) have to come from the father. Alleged fathers The first genetic tool used for human identification was are identified based on the ability to provide the remain- the ABO blood group antigens. Although this type of ing alleles (inclusion). Aside from possible mutations, a analysis could be performed in a few minutes, the dis- difference in just one allele may exclude paternity. crimination power was low. With only four possible A simplified RFLP paternity test is shown in groups, this method was only good for exclusion (elim- Figure 10.5. Of the two alleged fathers shown, only one ination) of a person and was informative only in 15% could supply the fragments not supplied by the mother. to 20% of cases. Analysis of the polymorphic HLA loci In this example, only two loci are shown. A parentage added a higher level of discrimination, with exclusion test requires analysis of at least eight loci. The more loci in 90% of cases. Testing both ABO and HLA did not tested, the higher the probability of positive identifica- provide positive identification, however. tion of the father. The initial use of DNA as an identification tool relied on RFLP detectable by Southern blot. As shown in Figure 10.1, RFLP can arise from a number of genetic events, including point mutations in the restriction site, AF 1 AF 2 Mother mutations that create a new restriction site, and inser- Locus Locus Locus A B A B A B tion or deletion of repeated sequences (tandem repeats). The insertion or deletion of nucleotides occurs fre- quently in repeated sequences in DNA. Tandem repeats of sequences of all sizes are present in genomic DNA (Fig. 10.6). Repeat units can be large enough so that loss or gain of one repeat is resolved by gel electro- phoresis of a restriction enzyme digest. The frequent Child cutters, HaeIII (recognition site GGCC) or HinfI (recog- Locus nition site GANTC), generate fragments that are small A B enough to resolve those that contain different numbers of repeats and thereby give an informative pattern by Southern blot. DNA Fingerprinting With RFLP FIGURE 10.5 Two alleged fathers (AFs) are being tested for The first human DNA profiling system was introduced by paternity of the child whose partial RFLP profile is shown in the United Kingdom Forensic Science Service in 1985 the bottom gel. The mother ’s alleles are shown in green. One using Sir Alec Jeffreys’s Southern blot multiple-locus AF (AF1) is excluded from paternity because he cannot supply probe (MLP)-RFLP system.5 This method utilized three the child’s paternal allele at locus B. to five probes to analyze three to five loci on the same One repeat unit GTTCTAGCGGCCGTGGCAGCTAGCTAGCTAGCTGCTGGGCCGTGG CAAGATCGCCGGCACCGTCGATCGATCGATCGACGACCCGGCACC FIGURE 10.6 A tandem repeat is a direct repeat of 1 to more than 100 nucleotides in length. The one Tandem repeat (4 units) shown has a 4-bp repeat unit (AGCT). A gain or loss of repeat units forms a different allele. Different GTTCTAGCGGCCGTGGCAGCTAGCTAGCTGCTGGGCCGTGG alleles are detected as variations in fragment size on CAAGATCGCCGGCACCGTCGATCGATCGACGACCCGGCACC digestion with a restriction enzyme, such as HaeIII (GGCC recognition sites). Tandem repeat (3 units) 266 Section III Techniques in the Clinical Laboratory blot. Results of probing multiple loci at once produced clear results. After visually inspecting the band pat- patterns that were highly variable between individuals terns, profiles were subjected to computer analysis to but that required some expertise to optimize and inter- accurately size the restriction fragments and apply the pret. In 1990, single-locus probe (SLP) systems were results to an established matching criterion. RFLP is an established in Europe and North America.6,7 Analysis of example of a continuous allele system in which the one locus at a time yielded simpler patterns, which were sizes of the fragments define alleles. Therefore, precise much easier to interpret, especially in cases where spec- band sizing was critical to the accuracy of the results. A imens might contain a mixture of DNA from more than match implied inclusion, which was refined by determi- one individual (Fig. 10.7). nation of the genotype frequency of each allele in the The RFLP Southern blot technique required 100 ng general or local population. This process established the to 1 μg of relatively high-quality DNA, 1 to 20 kbp likelihood of the same genotype occurring by chance. in size. Furthermore, large, fragile 0.7% gels were The probability of two people having the same set of required to achieve adequate band resolution, and the RFLP, or profile, becomes lower and lower as more loci 32 P-based probe system could take 5 to 7 days to yield are analyzed. M 1 P C 2 M E M 1 P C 2 M E Histooricaal Higghlligghtts Professor Sir Alec John Jeffreys, a British genet- icist, first developed techniques for genetic pro- filing, or DNA fingerprinting, using RFLP to identify humans. The technique has been used in forensics and law enforcement to resolve paternity and immigration disputes. The method can also be applied to nonhuman species, for example, in wildlife population genetics. The first application of this DNA technique was in a regional screen of human DNA to identify the rapist and killer of two girls in Leicestershire, England, in 1983 and 1986. Colin Pitchfork was identified and convicted of murder after samples taken from him matched semen samples taken from the two victims. FIGURE 10.7 Example of RFLP crime evidence using two STR TYPING BY PCR single-locus probes. M denotes molecular-weight markers, 1 and 2 are suspects, C is the child victim, and P is the parent of the child victim. E is evidence from the crime scene. For both The first commercial and validated typing test based on loci probed, suspect 2 “matches” the evidence found at the polymerase chain reaction (PCR) specifically for forensic crime scene. Positive identification of suspect 2 requires use was the HLA DQ alpha system, now called DQA1, further determination of the frequencies of these specific developed in 1986.8 This system could distinguish alleles in the population and the probability of matching them 28 DQA1 types. With the addition of another commer- by chance. cial system, the Polymarker (PM) system, the analyst Chapter 10 DNA Polymorphisms and Human Identification 267 could type five additional genetic markers. The PM system is a set of primers complementary to sequences Advanced Concepts flanking STRs, or microsatellites. STRs are similar to VNTRs (minisatellites) but have repeat units of 1 Although STRs with 4- and 5-bp repeat units are to 7 bp. (The upper limit of repeat unit size for STR highly informative and efficiently amplified, they varies from 7 to 10 bp, depending on different texts and are subject to naturally occurring genetic events. reports.) Because of the increased power of discrimi- Loss or gain of repeats or parts of repeat units, nation and ease of use of STR, the HLA DQA foren- as well as mutations within repeat units, are very sic DNA amplification and typing kit was discontinued rare occurrences. Because at least 8 to more than in 2002. 20 loci are included in STR applications, these The tandem repeat shown in Figure 10.6 is an STR minor events do not significantly affect the infor- with a 4-bp repeat unit, AGCT. Occasionally, STRs mative power. (Allele population frequency has contain repeat units with altered sequences, or micro- the most limiting effect on inclusion.) In abnormal variants, repeat units missing one or more bases of the cells with genetic instability, such as cancer cells, repeat. These differences have arisen through mutation gain or loss of repeats can occur more frequently, or recombination events. enough to affect the identification of genotypes.10 In contrast to VNTRs, the smaller STRs are effi- ciently amplified by PCR, easing specimen demands significantly. Long, intact DNA fragments are not STR alleles are identified by PCR product size. Primers required to detect the STR products; therefore, degraded are designed to produce amplicons of 100 to 400 bp in or otherwise less-than-optimal specimens are potentially which the STRs are embedded (Fig. 10.8). The sizes informative. The amount of specimen required for STR of the PCR products are influenced by the number of analysis by PCR is reduced from 1 μg to 10 ng, a key embedded repeats. If one of each primer pair is labeled factor for forensic analysis. Furthermore, PCR proce- with a fluorescent marker, the PCR product can be ana- dures shorten the analysis time from several weeks to lyzed in fluorescent detection systems. Silver-stained 24 to 48 hours. Careful design of primers and ampli- gels may also be used; however, capillary electrophore- fications facilitated multiplexing and automation of the sis with fluorescent dyes is the preferable method, espe- process.9 cially for high-throughput requirements. Allele 1 TH01 …TCATTCATTCATTCATTCATTCATTCATTCAT… …AGTAAGTAAGTAAGTAAGTAAGTAAGTAAGTA… Allele 2 FIGURE 10.8 STR TH01 (repeat unit TCAT) linked to the human tyrosine hydroxylase gene on chromosome 11p15.5. Primers are designed to amplify short regions …TCATTCATTCATTCATTCATTCATTCATTCATTCAT… containing the tandem repeats. Allelic ladders consisting of …AGTAAGTAAGTAAGTAAGTAAGTAAGTAAGTAAGTA… all alleles in the human population (flanking lanes in the gel shown at bottom right) are used to determine the PCR products: 7/8 7/10 number of repeats in the locus by the size of the amplicon. Allele 1 = 187 bp (7 repeats) –11 The two alleles shown contain seven and eight repeats. If Allele 2 = 191 bp (8 repeats) these alleles were found in a single individual, that person would be heterozygous for TH01 with a genotype of 7/8. Compare the 7/8 genotype pattern with the 7/10 genotype –5 gel pattern. 268 Section III Techniques in the Clinical Laboratory Histooricaal Higghlligghtts is inherited as a single haplotype, paternally related men share all Y loci.13 At least three to seven RFLP probes were orig- inally required to determine genetic identity. STR Analysis Available probes included G3, MS1, MS8, MS31, To identify STR alleles, test DNA is mixed with the and MS43, which were subclones of Jeffreys’s primer pairs, buffer, and polymerase to amplify the test multilocus probes 33.6 and 33.15 and pYNH24m, loci. Primer pairs may be laboratory designed or pur- MS205, and MS621. SLPs MS1, MS31, MS43, chased commercially. A control DNA standard is also G3, and YNH24 were used in the O. J. Simpson amplified, as well as a sensitivity control, if the rela- trial in 1996. tive allele percentage in a mixture will be calculated. Following amplification, each sample PCR product is combined with allelic ladders (sets of fragments repre- senting all possible alleles of a repeat locus) and internal A further development of STR analysis was the design of size standards (molecular-weight markers) in formamide mini-STR. These STRs are amplified with PCR primers for electrophoresis. After electrophoresis, detection and located closer to the tandem repeat than in the standard analysis software will size and identify the alleles based STR. Compared with standard STR products, the small on co-size migration with specific alleles in the allelic amplicons are more efficiently produced from such chal- ladders. In contrast to RFLPs and VNTRs, STRs are dis- lenging starting material as fixed tissue11 and degraded crete allele systems in which a finite number of alleles specimens.12 Another specialized system, Y-STR, was is defined by the number of repeat units in the tandem developed for surname testing and forensic identification repeat (see Fig. 10.8). Several available commercial of male offenders or victims. This primer set only ampli- systems consist of labeled primers for 1 to more than fies STR located on the Y chromosome. There is only 16 loci. The allelic ladders in these reagent kits allow one allele at each locus, and because the Y chromosome accurate identification of the sample alleles (Fig. 10.9).14 FGA PentaE TPOX D18S51 D8S1179 D2S11 TH01 FIGURE 10.9 Multiple STRs can be re- solved on a single gel. Here, four and five different loci are shown on the left and right gels, respectively. The allelic ladders show that the ranges of potential amplicon sizes vWA D3S1358 do not overlap, allowing resolution of multi- ple loci in the same lane. Two individual genotypes are shown on the second and third lanes of the two gels. Chapter 10 DNA Polymorphisms and Human Identification 269 STR by gel electrophoresis Advanced Concepts –11 Theoretically, the minimal sample requirement for PCR analysis is a single cell. A single cell has approximately 6 pg of DNA. This number is –5 derived from the molecular weight of A/T and G/C base pairs (617 and 618 g/mol, respectively). STR by capillary electrophoresis There are about 3 billion base pairs in one copy of the human genome; therefore, for one genome 7 8 copy: 3 × 109 bp × 618 g mol bp = 1.85 × 1012 g mol 1.85 × 1012 g mol × 1 mol 6.023 × 1023 molecules 7 10 = 3.07 × 10−12 g = 3 pg (A diploid cell has two genome copies, or 6 pg of DNA.) One ng (1,000 pg) of DNA should, there- fore, contain 333 copies (1,000 pg/3 pg/genome copy) of each locus. 5 11 Advances in fluorescence technology have increased FIGURE 10.10 STR analysis by capillary gel electrophore- the ease and sensitivity of STR allele identification sis. Instead of bands on a gel (top), peaks of fluorescence on an (Fig. 10.10). Although capillary electrophoresis is faster electropherogram reveal the PCR product sizes (bottom). and more automated than gel electrophoresis, a single Alleles (7, 8, or 10) are determined by comparison with allelic run through a capillary of single dye-labeled products ladders representing all possible alleles (from 5 to 11 repeats) for this locus, run through the capillary simultaneously with can resolve only loci whose allele ranges do not overlap. the sample amplicons. The number of loci that can be resolved on a single run was increased by the use of multicolor dye labels. Primer sets labeled with dyes that can be distinguished 100 bp 200 bp 300 bp 400 bp by their emission wavelength generate products that are D3S1358 vWA FGA Penta E resolved according to fluorescent color as well as size (Fig. 10.11). Test DNA amplicons, allelic ladders, and D8S1179 D21S11 D18S1179 Penta D size standards for multiple loci are thus run simultane- D5S818 D13S317 D7S820 ously through each capillary. Genotyping software pro- vides automated resolution of fluorescent dye colors and FIGURE 10.11 An illustration of the ranges of allele peak genotyping by comparison with the size standards and locations for selected STRs. By labeling primers with different the allelic ladder. fluorescent dye colors, STRs with overlapping size ranges can be resolved by color. The molecular-weight markers (bottom) are labeled with a fluorescent dye distinguishable from those Advanced Concepts used for the primer labeling. Commercial primer sets are designed with “stuffer” sequences to modify the size of the PCR the product size for a given allele will not always products so that the range of alleles for four to five be the same with primers from different commer- loci can be resolved by electrophoresis. Therefore, cial sources. 270 Section III Techniques in the Clinical Laboratory As in RFLP testing, an STR “match” is made by com- Analysis of Test Results paring profiles (alleles at all loci tested) followed by Analysis of polymorphisms at multiple loci results in probability calculations. The HLA DQ in conjunction very high levels of discrimination (Table 10.3). Dis- with the PM system generated highly discriminatory covery of the same set of alleles from different sources allele frequencies. For example, the chance of the same or shared alleles between allegedly related individuals set of alleles or profile occurring in two unrelated indi- is strong evidence of identity, paternity, or relatedness. viduals at random is 1 in 106 to 7 × 108 Caucasians or Results from such studies, however, must be expressed in 1 in 3 × 106 to 3 × 108 African Americans. terms of the background probability of chance matches. STR Nomenclature The International Society for Forensic Genetics rec- Histooricaal Higghlligghtts ommended nomenclature for STR loci in 1997.15 STRs within genes are designated according to the gene name; The GDB is overseen by the Human Genome for example, the STR TH01 is located in intron 1 of the Nomenclature Committee, a part of the Human human tyrosine hydroxylase gene on chromosome 11, Genome Organization (HUGO) located at Univer- and TPOX is located in intron 10 of the human thyroid sity College, London. HUGO was established in peroxidase gene on chromosome 2. These STRs do not 1989 as an international association of scientists have any phenotypic effect with respect to these genes. involved in human genetics. The goal of HUGO Non-gene-associated STRs are designated by the D#S# is to promote and sustain international collabora- system. In this system, the “D” stands for DNA; the fol- tion in the field of human genetics. The GDB was lowing number designates the chromosome where the originally used to organize mapping data during STR is located (1-22, X or Y). “S” refers to a unique the earliest days of the Human Genome Project. segment, followed by a number registered in the Inter- With the release of the human genome sequences national Genome Database (GDB). See Table 10.2 for and the development of PCR, the number of lab- some examples. oratories doing genetic testing grew significantly. STRs are present all over the genome. Some of the The GDB is still widely used as a source of infor- STR loci commonly used for laboratory investigation mation about PCR primers, PCR products, poly- are shown in Table 10.2. A comprehensive collection of morphisms, and genetic testing. Information from STR information is available at http://www.cstl.nist.gov/ the GDB is available at http://www.ncbi.nlm.nih strbase/..gov/sites/genome. Gender Identification The amelogenin locus is a very useful marker often ana- lyzed along with STR. The amelogenin gene, which is bp: 140 150 160 170 180 190 200 210 220 230 240 250 not an STR, is located on the X and Y chromosomes. The function of its encoded protein is required for embryonic development and tooth maturation. A polymorphism is located in the second intron of the amelogenin gene. The Y allele of the gene is 6 bp larger in this region than in the X allele. Amplification and electrophoretic resolu- tion reveal two bands or peaks for males (XY) and one band or peak for females (XX; Fig. 10.12). Some com- mercially available sets will contain primers to amplify FIGURE 10.12 Amplification of amelogenin will produce a the amelogenin polymorphism in addition to the STR male-specific 218-bp product (Y allele) in addition to the primer sets. Additional loci are now available for gender 212-bp product found on the X chromosome (X allele). Males testing, in cases where amelogenin may be compromised are heterozygous for the amelogenin locus (XY, top), and or poorly amplified.16 females are homozygous for this locus (XX, bottom). Chapter 10 DNA Polymorphisms and Human Identification 271 TABLE 10.2 STR Locus Information*71 Chromosome STR Locus Sequence Repeat Alleles† CD4 Locus between CD4 and 12p AAAAG‡ 4, 6, 7, 8, 8’, 9, 10, 11, 12, 13, 14, 15 triosephosphate isomerase CSF1PO c-fms protooncogene for 5q TAGA 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 CSF-1 receptor D3S1358 3p TCTA§ 8, 9, 10, 11, 12, 13, 14, 15, 15’, 15.2, 16, 16’, 16.2, 17, 17’, 17.1, 18, 18.3, 19, 20 D5S818 5q AGAT 7, 8, 9, 10, 11, 12, 13, 14 D7S829 7q GATA 7, 8, 9, 10, 11, 12, 13, 14, 15 D8S1179 Sequence tagged site 8q TCTA 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 D13S317 13q TATC 7, 8, 9, 10, 11, 12, 13, 14, 15 D16S539 16q GATA 5, 8, 9, 10, 11, 12, 13, 14, 15 D18S51 Sequenced tagged site 18q GAAA 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 D21S11|| 21q TCTG 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 F13A01 Coagulation factor IX 6p GAAA 3.2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 F13B Factor XIII b 1q TTTA 6, 7, 8, 9, 10, 11, 12 FESFPS c-fes/fps protooncogene 15q ATTT 7, 8, 9, 10, 11, 12, 13, 14 HPRTB Hypoxanthine Xq TCTA 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 phosphoribosyl-transferase LPL Lipoprotein lipase 8p TTTA 7, 8, 9, 10, 11, 12, 13, 14 TH01 Tyrosine hydroxylase 11p TCAT 5, 6, 7, 8, 9, 9.3, 10, 11 TPOX Thyroid peroxidase 2p TGAA 6, 7, 8, 9, 10, 11, 12, 13 vWA Von Willebrand factor 12p TCTA 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 Penta D 21q AAAGA 2.2, 3.2, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 Penta E 15q AAAGA 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20.3, 21, 22, 23, 24 *http://www.cstl.nist.gov/ † Some alleles have units with one, two, or three missing bases. ‡ In an alternate 8-repeat allele, one repeat sequence is AAAGG. § In alternate 15-, 16-, or 17-repeat alleles, one repeat sequence is TCTG. || D21S11 has multiple alternate alleles. 272 Section III Techniques in the Clinical Laboratory TABLE 10.3 Matching Probability of STR 95 100 105 110 115 120 125 130 135 140 145 Genotypes in Different Subpopulations African White Hispanic American American American 15 15.2 8 loci 1/274,000,000 1/114,000,000 1/145,000,000 9 loci 1/5.18 × 109 1/1.03 × 109 1/1.84 × 109 10 loci 1/6.76 × 1010 1/9.61 × 1010 12 loci 1/4.61 × 1012 12 13 14 15 16 17 18 19 14 loci 1/6.11 × 1017 1/9.96 × 1017 1/1.31 × 1017 FIGURE 10.13 A microvariant allele (15.2, top) migrates between the full-length alleles (15 and 16 on the allelic ladder, 16 loci 1/7.64 × 1017 1/9.96 × 1017 1/1.31 × 1017 bottom). chromosomes from an individual, the locus would be heterozygous, with 7 repeats on one chromosome and Advanced Concepts 8 repeats on the other. This locus would thus be des- ignated 7/8 or 7,8. A homozygous locus (where both homologous chromosomes carry the same allele) is In 1997 the Federal Bureau of Investiga- designated by the 7/7 or 7,7. Some reports use a single tion adopted 13 “core” loci as the Combined number, such as 6 or 7, to designate a homozygous locus. DNA Indexing System (CODIS). The loci are Microvariant alleles containing partial repeat units are TPOX on chromosome 2, D3S1358 on chromo- indicated by the number of complete repeats followed by some 3, FGA on chromosome 4, D5S818 and a decimal point and then the number of bases in the partial CSF1PO on chromosome 5, D7S820 on chro- repeat. For example, the 9.3 allele of the TH01 locus has mosome 7, D8S1179 on chromosome 8, TH01 10 repeats, 9 full 4-bp repeat units and 1 repeat unit with on chromosome 11, vWA on chromosome 12, 3 bp. Microvariants are detected as bands or peaks very D13S317 on chromosome 13, D16S539 on chro- close to the full-length allele (Fig. 10.13). mosome 16, D18S51 on chromosome 18, D21S11 The genotype, or profile, of a specimen is the col- on chromosome 2, and the amelogenin locus on lection of alleles in all the loci tested. To determine the the X and Y chromosomes. The National Institute extent of certainty that one profile matches another, the of Standards and Technology supplies Standard occurrence of the detected genotype in the general or a Reference Material that certifies values for 22 defined population must be assessed. STR loci, including CODIS and markers used by A matching genotype is not necessarily an absolute European forensic laboratories. determination of the identity of an individual. Genetic concordance is a term used to express the situation where all locus genotypes (alleles) from two sources are the same. Concordance is interpreted as inclusion of a Genotyping single individual as the donor of both genotypes. Two DNA testing results in peak or band patterns that must samples are considered different if at least one locus be converted to genotype (allele identification). As genotype differs (exclusion). An exception is paternity described previously, an STR locus genotype is defined testing, in which mutational events may generate a new by the number of repeats in its alleles. For instance, if the allele in the offspring at one locus, and this difference locus genotype in Figure 10.8 represented homologous may not rule out paternity. Chapter 10 DNA Polymorphisms and Human Identification 273 Technical artifacts such as air bubbles, crystals, and sizes can be established. An acceptable range of sizes dye blobs, as well as sample contaminants, temperature in this distribution is a bin. A bin can be thought of as variations, and voltage spikes, can interfere with consis- an uncertainty window surrounding the mean position tent band migration during electrophoresis. In addition, (size) of multiple runs of each peak or band. All bands amplification artifacts occur during PCR. Some poly- or peaks, therefore, that fall within this window are con- merases add an additional non-template adenine residue sidered identical. Collection of all peaks or bands within to the 3′ end of the PCR product. If this 3′ nucleotide a characteristic distribution of positions and areas is addition does not include all the amplified products, called binning. Bins for each allele can be established a mixed set of amplicons will result in extra bands or manually in the laboratory. Alternatively, commercially peaks located very close together. available software has been designed to automatically Stutter is another anomaly of PCR amplification, bin and identify alleles.17 in which the polymerase may miss a repeat during the All peaks within a bin are interpreted as representa- replication process, resulting in two or more different tive of the same allele of a locus. Each band or peak in a species in the amplified product. These also appear as genotype is binned and identified according to its migra- extra bands or peaks. Generally, the larger the repeat tion characteristics. The group of bands or peaks makes unit length, the less stutter is observed. These or other up the characteristic pattern or profile of the specimen. aberrant band patterns confuse the analysis software and can result in the miscalling of alleles. Matching requires clear and unambiguous laboratory Advanced Concepts results. As alleles are identified by gel resolution, good intragel precision (comparing bands or peaks on the Binning may be performed in different ways using same gel or capillary) and intergel precision (compar- replicate peak heights and positions. To calculate ing bands or peaks of separate gels or capillaries) are the probability that two peaks are representative important. In general, intergel precision is less stringent of the same allele, the proportion of alleles that than intragel precision. This is not unexpected because fall within the uncertainty window (bin) must be the same samples may run with slightly different migra- determined. This proportion is represented exactly tion speeds on different gels. Because some microvariant in fixed bins and approximated in floating bins. alleles differ by only a single base pair (see Fig. 10.13), The fixed-bin approach is an approximation of the resolution must be less than ±0.5 bp. The TH01 9.3 the more conservative floating-bin approach.18 An allele described earlier is an example. This allele must alternative assessment of allele certainty is the be distinguished from the 10 allele, which is a single use of locus-specific brackets. In this approach, base pair larger than the 9.3 allele. artificial “alleles” are designed to run at the high To establish the identity of peaks from capillary elec- and low limit of the expected allele size. Identi- trophoresis (or peaks from densitometry tracings of gel cal alleles are expected to fall within this defined data), the peak is assigned a position relative to some bracket.17 landmark within the gel lane or capillary, such as the loading well or the start of migration. Upon replicate resolutions of a band or peak, electrophoretic variations Matching of Profiles from capillary to capillary, lane to lane, or gel to gel may occur. Normalization of migration is achieved by The number of loci tested must be taken into consid- the relation of the migration of the test peaks to the eration in genotyping analysis. The more loci analyzed, simultaneous migration of size standards. Size stan- the higher the probability that the locus genotype posi- dards can be internal (in the same gel lane or capillary) tively identifies an individual (match probability; see or external (in a separate gel lane). Even with normal- Table 10.3). Degraded, compromised, or mixed samples ization, however, tiny variations in position, height, will affect the match probability because all loci may and area of peaks or gel bands may persist. If the same not yield clear, informative results. Criteria for interpre- fragments are run repeatedly, a distribution of observed tation of results and determination of a true allele are 274 Section III Techniques in the Clinical Laboratory established by each laboratory. These criteria are based Individual allele frequencies are determined by data on validation studies and results reported from other collected from testing many individuals in general and laboratories. Periodic external proficiency testing is per- defined populations. For example, at locus Penta D on formed to confirm the accuracy of test performance. chromosome 21, the 5 allele has been previously deter- Results from the analysis of polymorphisms are used mined to occur in 1 of 10 people in a theoretical popu- to determine the probability of identity or inheritance lation. At locus D7S829 on chromosome 7, the 8 allele of genetic markers or to match a particular marker or has been previously observed in 1 of 50 people in the marker pattern. To establish the identity of an individual same population. The overall frequency of the profile by an allele of a locus, the chance that the same allele containing the loci Penta D 5 allele and D7S829 8 allele could arise in the population randomly is taken into would be 1/10 × 1/50 = 1/500. That is, a genotype or account. profile containing D7S829 8 and Penta D 5 alleles would be expected to occur in 1 out of every 500 randomly Advanced Concepts chosen members of that population. As should be appar- ent, the more loci tested, the greater the certainty that the profile is unique to a single individual in that population; The certainty of a matching pattern increases with that is, the overall frequency of the profile is very low. decreased frequency of alleles in the general pop- The overall frequencies in Table 10.3 illustrate this point. ulation. Under defined conditions, the relative Allele frequencies differ between subpopulations frequency of two alleles in a population remains or ethnic groups. Different allele frequencies in sub- constant. This is Hardy–Weinberg equilibrium, populations are determined through the study of each or the Hardy–Weinberg law.19 The population fre- ethnic group.20 The data in Table 10.3 illustrate differ- quency of two alleles, p and q, can be expressed ences in the polymorphic nature of alleles in different mathematically as subpopulations. p2 + 2pq + q 2 = 1.0 When profile identification requires comparing the This equilibrium assumes a large population with genotype of an unknown specimen with a known ref- random mating and no immigration, emigration, erence sample—for example, the genotype of evidence mutation, or natural selection. Under these circum- from a crime and the genotype of an individual from stances, if enough individuals are assessed, a close a database—the determination that the two genotypes approximation of the true allele frequency in the match (are from the same person) is expressed in terms population can be determined. of a likelihood ratio. The likelihood ratio is the compar- ison of the probability that the two genotypes came from the same person with the probability that the two geno- The frequency of a set of alleles or a genotype in a pop- types came from different persons, taking into account ulation is the product of the frequency of each allele allele frequencies and linkage equilibrium in the pop- separately (the product rule). The product rule can be ulation. A high likelihood ratio is an indication that the applied because of linkage equilibrium. Linkage equilib- probability is more likely that the two genotypes came rium assumes that the observed frequencies of haplotypes from the same person, whereas a likelihood ratio of less in a population are the same as haplotype frequencies than 1 indicates that this probability is less likely. If a predicted by multiplying together the frequency of indi- likelihood ratio is [1/(1/1,000)] = 1,000, the tested geno- vidual genetic markers in each haplotype and that loci types are 1,000 times more likely to have come from the are not genetically linked (located close to one another) same person than from two randomly chosen members in the genome. The overall frequency (OF) of a locus of the population, where the profile occurs in 1/1,000 genotype consisting of n loci can be calculated as people. In a random sampling of 100,000 members of a population, 100 people (100,000/1,000) with the same OF = F1 × F2 × F3 ×... Fn genotype might be found. where F1... n represents the frequency of each individual A simplified illustration can be made from the previ- allele in the population. ous Penta D and D7S829 example. Suppose the Penta Chapter 10 DNA Polymorphisms and Human Identification 275 D 5 (1/10 allele frequency) and D7S829 8 (1/50 allele differently defined groups. It is also important to con- frequency) profiles were discovered in a specimen from sider whether the population is homogeneous (a random an independent source. The likelihood that the Penta mixture) with respect to the alleles tested. Familial D 5/D7S829 8 profile came from the tested individual is searches and forensic applications involving mass disas- 1, having been directly determined. The likelihood that ters or other complex mixtures of DNA involve analysis the same profile could come from someone else in the of partial or uncertain DNA genotypes. More advanced population is 1/500. The likelihood ratio is 1/(1/500), or approaches, such as “wild card” designations for missing 500. The specimen material is 500 times more likely to alleles, are required for defining the likelihood in these have come from the tested individual than from some cases.21 other person in the population. Allelic Frequencies in Paternity Testing Histooricaal Higghlligghtts A paternity test is designed to choose between two hypotheses: The test subject is not the father of the tested Sir Alec Jeffreys’ DNA profiling was the basis for child (H0), or the test subject is the father of the tested the National DNA Database (NDNAD) launched child (H1). Paternity is first assessed by observation of in Britain in 1995. Under British law, the DNA shared alleles between the alleged father and the child profile of anyone convicted of a serious crime is (Fig. 10.14). The identity of shared alleles is a process stored on a database. This database includes 6% of matching, as described previously for identity testing. of the UK population (compared with 0.5% of the population in the U.S. database). Over 5.9 million DNA profiles are held in the database, accounting Histooricaal Higghlligghtts for a majority of the known active-offender pop- ulation. Nearly 60% of crime-scene profiles sub- Peter Gill developed a forensic DNA identifi- mitted to the NDNAD were matched to a subject cation method for minimal samples called low- profile between 2008 and 2009. copy-number analysis.22 In contrast to standard The National DNA Index System (NDIS) DNA analysis that requires approximately 200 pg is the federal level of the CODIS used in the DNA, low-copy-number analysis is reportedly United States. There are three levels of CODIS: performed on less than 100 pg DNA (about the Local DNA Index System (LDIS), State DNA 16 diploid cells). The method involves increas- Index System (SDIS), and NDIS. At the local ing the number of PCR cycles for amplification level, CODIS software maintained by the Federal and fastidiously cleaned work areas. Although Bureau of Investigation (FBI) is used in sizing used in over 21,000 serious criminal cases since alleles as the assay is performed. This informa- the 1990s, the validity of this technique has been tion may be compiled locally and/or submitted questioned in appeal cases.23 Assaying limited to the SDIS. The state data may be sent to the amounts of starting material may result in peak NDIS. The SDIS and NDIS must adhere to the dropouts (failed amplification of an allele) or quality assurance standards recommended by the drop-ins (mis-priming of an allele). Further- FBI. The original entries to these databases were more, the heightened detection limit required RFLP profiles; further entries have been the STR for low amounts of target DNA raises the risk of profiles. As of October 2018, the NDIS contained contamination. over 13 million offender profiles. Because each allele of a genotype is inherited from one When comparing genotypes with those in a database parent, a child will share one allele of every locus with looking for a match, it is important to consider whether the paternal parent. A paternity index, or likelihood the database is representative of a population or subpop- of paternity, is calculated for each locus in which the ulation because allele frequencies will be different in alleged father and the child share an allele. The paternity 276 Section III Techniques in the Clinical Laboratory C vWA TH01 AMEL TP0X F13A01 CSF M FIGURE 10.14 Electropherogram showing results from F five STR loci and the amelogenin locus for a child (C), mother (M), and father (F). Note how the child has inher- ited one of each allele from the mother (black dots) and one from the father. and evaluates the genotype information. The CPI for the TABLE 10.4 Example Data From a Paternity Test data shown in Table 10.4 are Showing Inclusion CPI = 5.719 × 8.932 × 15.41 × 10.22 = 8, 044.931 Father Shared Paternity These data indicate that the child is 8,045 times more Alleged Child Allele Allele Index likely to have inherited the four observed alleles from the alleged father than from another man in the population. D16S539 8, 9 9, 10 9 5.719 If a paternal allele does not match between the alleged D5S818 10, 12 7, 12 12 8.932 father and the child, H1 for that allele is 0. One might assume, therefore, that the nonmatching allele paternity FESFPS 9, 13 13, 14 13 15.41 index of 0 would make the CPI 0. This is not the case. F13A01 4, 5 5, 7 5 10.22 Nonmatching alleles between the alleged father and the child found at one locus (exclusion) is traditionally not regarded as a demonstration of non-paternity because of the possibility of mutation. Although mutations were index is an expression of how many times more likely quite rare in the traditional RFLP systems, analysis of the child’s allele is inherited from the alleged father than 12 or more STR loci may occasionally reveal one or two by another man in the general population. An allele that mutations resulting in nonmatching alleles even if the occurs frequently in the population has a low paternity man is the father. To account for mutations, the paternity index; a rare allele has a high paternity index. Table index for nonmatching alleles is calculated as 10.4 shows the paternity index for each of four loci. The FESFPS 13 allele is rarer than the D 16S539 9 allele. In paternity index for a mutant allele = µ this example, the child is 5.719 times more likely to have where μ is the observed mutation rate (mutations/mei- inherited the 9 allele of locus D16S539 from the alleged osis) of the locus. The American Association of Blood father than from another random man in the population. Banks (AABB) has collected data on mutation rates in Similarly, the child is 15.41 times more likely to have STR loci (Table 10.5). Using these data, in the case of a inherited the 13 allele of FESFPS from the alleged father nonmatching allele, H1 is not 0 but μ. A high mutation than by random occurrence. When each tested locus rate (close to 1) would not lower the CPI, whereas a very is on a different chromosome (not linked), the inheri- low mutation rate (closer to 0) would do so. tance or occurrence of each allele can be considered an In a paternity report, the combined paternity index is independent event. The paternity index for each locus, accompanied by the probability of paternity, a number therefore, can be multiplied together to calculate the calculated from the combined paternity index (genetic combined paternity index (CPI), which summarizes evidence) and prior odds (nongenetic evidence). For the Chapter 10 DNA Polymorphisms and Human Identification 277 TABLE 10.5 Observed Mutation Rates TABLE 10.6 Odds of Paternity Using Different in Paternity Tests Using STR Loci Prior Odds Assumptions STR Locus Mutation Rate (%) Prior Odds D1S1338 0.09 CPI 10% 25% 50% 75% 90% D3S1358 0.13 5 0.36 0.63 0.83 0.94 0.98 D5S818 0.12 9 0.50 0.75 0.90 0.96 0.98 D7S820 0.10 19 0.68 0.86 0.95 0.98 0.994 D8S1179 0.13 99 0.92 0.97 0.99 0.997 0.999 D13S317 0.15 999 0.99 0.997 0.999 0.9997 0.9999 D16S539 0.11 D18S51 0.25 In the example illustrated previously, the CPI is 8,044.931. The probability of paternity is D19S433 0.11 8, 044.931 × 0.50 = 0.999987 D21S11 0.21 (8, 044.931 × 0.50) + 0.50 CSF1PO 0.16 The genetic evidence (CPI) has changed the probability of paternity (prior odds) of 50% to 99.9987%. FGA 0.30 There is some disagreement about the assumption of TH01 0.01 50% prior odds. Using different prior odds assumptions changes the final probability of paternity (Table 10.6). TPOX 0.01 As can be observed from the table, however, at a CPI VWA 0.16 over 100, the differences have less effect. F13A01 0.05 Sibling Tests FESFPS 0.05 Polymorphisms are also used to generate a probability of F13B 0.03 siblings or other blood relationships (familial searches).24 A sibling test is a more complicated statistical analysis LPL 0.05 than a paternity test. Mutations and allele frequencies Penta D 0.13 further complicate analysis.25 More confident conclusions can be made with multiple siblings. Methods involving Penta E 0.16 parental genotype reconstruction have been proposed.26 A full sibling test is a determination of the likelihood that two people tested share a common mother and father. A prior odds, the laboratory as a neutral party assumes a half-sibling test is a determination of the likelihood that 50/50 chance that the test subject is the father. There- two people tested share one common parent (mother or fore, the probability of paternity is father). The likelihood ratio generated by a sibling test CPI × prior odds is sometimes called a kinship index, sibling index, or combined sibling index. (CPI × prior odds) + (1 − prior odds) Another type of relationship analysis is avuncu- CPI × 0.50 lar testing, which measures the probabilities that two = (CPI × 0.50) + (1 − 0.50) alleged relatives are related as either an aunt or an uncle 278 Section III Techniques in the Clinical Laboratory of a niece or nephew. The probability of relatedness is parent, Y-STRs are represented only once per genome based on the number of shared alleles between the tested and only in males (Fig. 10.15). A set of Y-STR alleles individuals. As with paternity and identity testing, allele comprises a haplotype, or series of linked alleles always frequency in the population will affect the significance inherited together. This is because the Y chromosome of the final results. The probabilities can be increased cannot extensively exchange information (recombine) greatly if other known relatives, such as a parent of the with the X chromosome or another Y chromosome. niece or nephew, are available for testing. Determination Thus, marker alleles on the Y chromosome are inher- of first- and second-degree relationships is important ited from generation to generation in a single block. for genetic studies because linkage mapping of disease This means that the frequency of entire Y-STR profiles genes in populations can be affected by undetected (haplotypes) in a given population can be determined familial relationships.27 by empirical studies. For example, if a combination of alleles (haplotype) was observed only two times in a test of 200 unrelated males, that haplotype is expected to Y-STR occur with a frequency of approximately 1 in 100 males Unlike conventional STRs (autosomal STRs), where tested in the future. The discrimination power of each locus is defined by two alleles, one from each Y-haplotype testing will depend on the number of 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 Y-PLEX LADDER 12–14 13–16 28–33 Y-PLEX LADDER 22–25 9–12 8, 10–19 Molecular-weight standards Y alleles 15 15 29 Y alleles 21 10 17 Molecular-weight standards FIGURE 10.15 Electropherogram showing allelic ladders for six STR loci in the Y-Plex 6 system (top panel) and a single hap- lotype (bottom panel). Molecular-weight standards are shown at the bottom of each. Chapter 10 DNA Polymorphisms and Human Identification 279 subjects tested and will be less definitive than that of Y-specific primers, Y-STR can be specifically amplified autosomal STR. by PCR from the male–female mixture, resulting in an Despite being a less powerful system for identifica- analyzable marker that has no female background. This tion, STR polymorphisms on the Y chromosome have affords a more accurate identification of the male donor. unique characteristics that have been exploited for foren- The Y chromosome has a low mutation rate. The sic, lineage, and population studies as well as kinship overall mutation rate for Y chromosome loci is esti- testing.28 Except for rare mutation events, every male mated at 7.4 × 10−10 mutations per position per year.29 member of a family (brothers, uncles, cousins, and Assuming that Y-chromosome mutations generally grandfathers) will have the same Y-chromosome haplo- occur once every 500 generations/locus, for 25 loci, type. Thus, Y-chromosome inheritance can be applied to 1 locus should have a mutation every 20 generations lineage, population, and human migration studies. The (500 generations/25 markers = 20 generations). Lineage Y-STR/paternal lineage test can determine whether testing over several generations is made possible by this two or more males have a common paternal ancestor. low mutation rate. It is also useful for missing persons’ In addition to family history studies, the results of a cases in which reference samples can be obtained from paternal lineage test serve as supportive evidence for paternally related males. adoptees and their biological relatives or for individuals A list of informative Y-STRs is shown in Table 10.7. filing inheritance and benefit claims. Because Y chro- Several Y-STRs are located in regions that are dupli- mosomes are inherited intact, spontaneous mutations cated on the Y chromosome. DSY389I and DSY389II in the DNA sequence of the Y chromosome are used to are examples of duplicated loci. follow human migration patterns and historical lineages. Like autosomal STRs, Y-STRs have microvariant Y-chromosome genotyping has been used to locate the alleles containing incomplete repeats and alleles con- geographical origin of populations. taining repeat sequence differences. Reagent systems Because all male relatives in a family will share the consisting of multiplexed primers for identification of same allele combination or profile, the statistical signif- 6-17 Y-STR loci are available commercially. icance of a Y-STR DNA match cannot be assessed by multiplying likelihood ratios as was described previ- Matching With Y-STRs ously for autosomal STR. Instead of the allele frequency used in autosomal STR match calculations, haplotype Matching probabilities from Y-STR data are deter- frequencies are used. Estimation of haplotype frequen- mined differently than for the autosomal STR. Haplo- cies, however, is limited by the number of known Y hap- type diversity (HD) is calculated from the frequency of lotypes. This smaller data set accounts for the reduced occurrence of a given haplotype in a tested population. inclusion probabilities and a discrimination rate that is The probability of two random males sharing the same significantly lower than that for autosomal STR poly- haplotype is estimated at 1-HD; that is, if the haplotype morphisms. Traditional STR loci are, therefore, preferred diversity is high, the probability of two random males for identity or relationship analyses, and the Y-STRs are in the population having the same haplotype is low. used to aid in special situations—for instance, in con- Another measure of profile uniqueness, the discrimina- firming sibship between males who share commonly tory capacity (DC), is determined by the number of dif- occurring alleles, that is, have a low likelihood ratio ferent haplotypes seen in the tested population and the based on traditional STRs. total number of samples in the population. DC expresses Y-STRs have been utilized in forensic tests where the percentage of males in a population who can be iden- the evidence consists of a mixture of male and female tified by a given haplotype. Just as the number of loci DNA, such as semen, saliva, other body secretions, or included in an autosomal STR genotype increases the fingernail scrapings. For instance, in specimens from power of discrimination, DC is increased by increasing the evidence of sexual assault, the female DNA may the number of loci defining a haplotype. For instance, be in vast excess (more than 100-fold) compared to the six loci tested can distinguish 82% of African Ameri- male DNA in the sample. Autosomal STRs are not con- can males. Using 22 loci raises the DC to almost 99% sistently informative under these circumstances. Using (Table 10.8). 280 Section III Techniques in the Clinical Laboratory TABLE 10.7 Y-STR Locus Information*72-74 Y-STR Repeat Sequence† Alleles DYS19 [TAGA]3TAGG[TAGA]n 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 DYS385 [GAAA]n 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16.3, 17, 17.2 17.3, 18, 19, 20, 21, 22, 23, 24, 28 DYS388 [CAA]n 10, 11, 12, 13, 14, 15, 16, 17, 18 DYS389 I‡ [TCTG]q [TCTA]r 9, 10, 11, 12, 13, 14, 15, 16, 17 DYS389 II‡ [TCTG]n[TCTA]p[TCTG]q[TCTA]r 26, 27, 28, 28’, 29, 29’, 29’’, 29’’’, 30, 30’ 30’’, 30’’’, 31, 31’, 31’’, 32, 32’, 33, 34 DYS390 [TCTG]n[TCTA]m[TCTG]p[TCTA] 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 DYS391 [TCTA]n 6, 7, 8, 9, 10, 11, 12, 13, 14 DYS392 [TAT]n 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17 DYS393 [AGAT]n 9, 10, 11, 12, 13, 14 DYS426 [CAA]n 6.2, 9, 10, 11, 12, 13, 14 DYS434 [CTAT]n 8, 9, 10, 11 DYS437 [TCTA]n[TCTG]2[TCTA]4 13, 14, 15, 16, 17 DYS438 DYS439 [TTTTC]n 6, 7, 8, 9, 10, 11, 12, 13, 14 DYS439 [GATA]n 9, 10, 11, 12, 13, 14 (Y-GATA-A4) [GATA]n 9, 10, 11, 12, 13, 14 DYS441 [CCTT]n 8, 10.1, 11, 11.1, 12, 13, 13.1, 14, 14.3, 15, 16, 17, 18, 19, 20 DYS442 [TATC]n 8, 9, 10, 11, 12, 12.1, 13, 14, 15 DYS444 [TAGA]n 9, 10, 11, 12, 13, 14, 15, 16 DYS445 [TTTA]n 6, 7, 8, 9, 10, 10.1, 11, 12, 13, 14 DYS446 [AGAGA]n 8, 9, 10, 11, 12, 13, 14, 15, 15.1, 16, 17, 18, 19, 19.1, 20, 21, 22, 23 DYS447 [TTATA]n 15, 16, 17, 18, 19, 19.1, 20, 21, 22, 22.2, 22.4, 23, 24, 25, 26, 26.2, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 DYS448 [AGAGAT]n 17, 19, 19.2, 20, 20.2, 20.4, 21, 21.2, 21.4, 22, 22.2, 23, 23.4, 24, 24.5, 25, 26, 27 DYS449 [GAAA]n 23, 23.4, 24, 24.5, 25, 26, 27, 27.2, 28, 28.2, 29, 29.2, 30, 30.2, 31, 32, 32.2, 33, 33.2, 34, 35, 36, 37, 37.3, 38 DYS452 [TATAC]n 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 DYS454 [TTAT]n 6, 7, 8, 9, 10, 11, 12, 13 DYS455 [TTAT]n 7, 8, 9, 10, 11, 12, 13 Chapter 10 DNA Polymorphisms and Human Identification 281 TABLE 1.7 Y-STR Locus Information (Continued) Y-STR Repeat Sequence† Alleles DYS456 [AGAT]n 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 DYS458 [CTTT]n 12, 12.2, 13, 14, 15, 15.2, 16, 16.1, 16.2, 17, 17.2, 18, 18.2, 19, 19.2, 20, 20.2, 21 DYS460 [ATAG]n 7, 8, 9, 10, 10.1, 11, 12, 13 (Y-GATA-A7.1) *http://www.cstl.nist.gov † Some alleles contain repeats with one, two, or three bases missing. ‡ DYS389 I and II is a duplicated locus. Because there is no recombination between loci on the TABLE 10.8 Discriminatory Capacity of Y-STR Y chromosome, the product rule cannot be applied. The Genotypes in Different Subpopulations75 results of a Y typing might be reported accompanied by the number of observations or frequency of the analyzed African White Hispanic American American American haplotype in a database of adequate size. Suppose a (%) (%) (%) haplotype containing the 17 allele of DYS390 occurs in only 23% of men in a database of 12,400. However, if 6 loci* 82.3 68.9 78.3 that same haplotype contains the 21 allele of DYS446, 9 loci† 84.6 74.8 85.1