Genomics and Bioinformatics: An Overview

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes genomics?

  • The analysis of protein structure and function.
  • The application of computational tools to analyze large biological datasets.
  • The study of all the information within the entire genome. (correct)
  • The study of individual genes and their functions.

Bioinformatics is primarily focused on experimental laboratory techniques rather than computational analysis.

False (B)

Which of the following is a common application of bioinformatics?

  • Comparing gene sequences between different species. (correct)
  • Synthesizing new DNA molecules in a laboratory.
  • Performing surgical procedures.
  • Developing new antibiotics.

The process of 'stitching together' genomic sequences to create a complete genome sequence is facilitated by the field of ___________.

<p>bioinformatics</p>
Signup and view all the answers

Match the sequencing method with its description:

<p>Sanger Sequencing = Traditional method involving chain termination. Next-Generation Sequencing (NGS) = High-throughput methods that sequence millions of DNA molecules simultaneously. Third-Generation Sequencing = Methods that can read sequence from single DNA molecules.</p>
Signup and view all the answers

What is the primary goal of performing a BLAST search?

<p>To compare a nucleotide or protein sequence to database sequences. (A)</p>
Signup and view all the answers

The ENCODE project revealed that all RNA transcripts encode proteins.

<p>False (B)</p>
Signup and view all the answers

Which of the following databases houses a large collection of genomic DNA sequences, identified genes, and proteins?

<p>National Center for Biotechnology Information (NCBI) (A)</p>
Signup and view all the answers

__________ splicing allows a single gene to code for multiple proteins.

<p>alternative</p>
Signup and view all the answers

Match the 'omics' field with its description:

<p>Genomics = Study of all genes in the genome Transcriptomics = Study of all expressed genes in a cell or tissue Proteomics = Study of all proteins in a cell or tissue Metabolomics = Study of all proteins and enzymes involved in metabolism</p>
Signup and view all the answers

What percentage of the human genome is estimated to consist of protein-coding sequences?

<p>2% (D)</p>
Signup and view all the answers

Personalized medicine involves treating patients based on a population average rather than their individual DNA sequence.

<p>False (B)</p>
Signup and view all the answers

Which of the following is a primary application of the 1000 Genomes Project?

<p>Identifying genetic differences among different human populations. (D)</p>
Signup and view all the answers

Sequences of DNA that are similar across species are said to be __________.

<p>conserved</p>
Signup and view all the answers

Match the sequencing technology with its generation.

<p>Sanger sequencing = First-generation sequencing 454 sequencing = Second-generation sequencing Pacific Biosciences sequencing = Third-generation sequencing</p>
Signup and view all the answers

What is a key challenge in predicting protein structure from its amino acid sequence?

<p>The complexity of three-dimensional folding patterns. (B)</p>
Signup and view all the answers

ENCODE project primarily focuses on identifying protein-coding genes within the genome.

<p>False (B)</p>
Signup and view all the answers

Which of the following represents a significant advancement in protein structure prediction?

<p>AlphaFold (A)</p>
Signup and view all the answers

__________ are variations in a single nucleotide that are spread across the genome and contribute to individual differences.

<p>snps</p>
Signup and view all the answers

Match the application with the genomic technology.

<p>Identifying protein coding genes = Genome analysis Identifying promotor and enhancer sequences = ENCODE project Finding genetic differences among human populations = 1000 genome project</p>
Signup and view all the answers

Which of the following is a key characteristic of third-generation sequencing technologies?

<p>The ability to sequence single molecules of DNA in real-time. (B)</p>
Signup and view all the answers

The human microbiome consists of only bacteria.

<p>False (B)</p>
Signup and view all the answers

In the context of genomics, what is a 'contig'?

<p>A contiguous sequence of DNA assembled from overlapping sequence reads. (D)</p>
Signup and view all the answers

The area of genomics that allows the study of how protein sequences encoded by conserved genes have changed during evolution is known as __________.

<p>phylogenomics</p>
Signup and view all the answers

Match the genomic terms with their definitions.

<p>Genomics = The study of all the genes in the genome Bioinformatics = The application of computational tools to analyze biological data Transcriptomics = The study of all expressed genes in a cell or tissue</p>
Signup and view all the answers

What is the primary purpose of microbiome transplants in medicine?

<p>To introduce beneficial microorganisms to treat certain illnesses. (C)</p>
Signup and view all the answers

Synthetic genomes have enabled scientists to create entirely new organisms with no ancestral relationship to existing life forms.

<p>False (B)</p>
Signup and view all the answers

Which of the following is the correct order of steps in shotgun sequencing?

<p>Fragment DNA, sequence fragments, align contiguous sequences, generate finished sequence. (D)</p>
Signup and view all the answers

A major aim of genomics is to identify the __________ coding genes that are present in the genome.

<p>protein</p>
Signup and view all the answers

Match the type of sequencing with its description.

<p>Whole genome sequencing = Sequencing of the entire genome of an organism. Exome sequencing = Sequencing of only the protein-coding regions of the genome. RNA sequencing = Sequencing of all RNA molecules in a sample.</p>
Signup and view all the answers

What is a common application of analyzing Neanderthal DNA?

<p>Studying how modern humans evolved and adapted. (B)</p>
Signup and view all the answers

What does the acronym SNP stand for in the context of genomics, and why are they important?

<p>Single Nucleotide Polymorphism. SNPs contribute to individual genetic variation and can be associated with disease susceptibility.</p>
Signup and view all the answers

Describe the process of identifying protein-coding genes using cDNA sequencing and its advantages over genomic DNA sequencing.

<p>cDNA sequencing involves reverse transcribing mRNA into DNA, which is then sequenced. Since cDNA lacks introns, it provides a more direct representation of expressed genes.</p>
Signup and view all the answers

Distinguish between genomics and transcriptomics, and briefly explain how they complement each other in systems biology. Genomics studies the entire genome, while transcriptomics focuses on the __________.

<p>transcriptome</p>
Signup and view all the answers

Describe how advances in sequencing technology have enabled rapid genome sequencing and led to the development of the new area of genetics called _________.

<p>genomics</p>
Signup and view all the answers

Flashcards

What is Genomics?

The study of all the information within an organisms DNA.

What is Bioinformatics?

A field using computational techniques to organize, share, and analyze biological data.

Genome Sequencing

The process of assembling many DNA fragments to determine the entire genome sequence.

What is Whole Genome Shotgun Sequencing?

Mapping and sequencing all DNA fragments, then assembling the entire sequence.

Signup and view all the flashcards

What is Gene Identification?

An application of genomics to identify protein-coding region in a genome.

Signup and view all the flashcards

What is BLAST?

A tool (Basic Local Alignment Search Tool) for comparing sequences to find similarities.

Signup and view all the flashcards

What are regulatory elements?

Specific DNA sequences associated with genes

Signup and view all the flashcards

What are genome databases?

Databases holding genomic and protein sequence information.

Signup and view all the flashcards

What does the NCBI Genome Browser do?

Allows viewing gene organization and alternative splicing.

Signup and view all the flashcards

What is the ENCODE project?

Project to identify all functional elements in the human genome.

Signup and view all the flashcards

What are SNPs (Single Nucleotide Polymorphisms)?

Differences at single nucleotides spread across the genome.

Signup and view all the flashcards

What is the 100,000 Genome Project?

Sequencing genomes to identify disease-causing sequence differences

Signup and view all the flashcards

What is the Microbiome?

Collection of all microorganisms living in the human body.

Signup and view all the flashcards

What is Transcriptomics?

A large-scale biological study focusing on all expressed genes in a cell.

Signup and view all the flashcards

What is Proteomics?

The area of study concerned with analysis of all proteins in a cell

Signup and view all the flashcards

What are Microbiome Transplants?

A field involving microbiome transplants to treat illnesses.

Signup and view all the flashcards

What are Synthetic Genomes?

Constructing new genes or genomes using DNA synthesis chemistry.

Signup and view all the flashcards

What is Transcriptomics?

Analysing how expression changes across cell types with different conditions.

Signup and view all the flashcards

What is Personalized Genomics

These allow analysis of a person's ancestry.

Signup and view all the flashcards

Ancient DNA Analysis

Sequencing extinct species to find out more about them.

Signup and view all the flashcards

What is Metagenomics?

Collection of all the DNA associated to organisms from different environments.

Signup and view all the flashcards

What does 'synthetic biology' mean?

Construction of new genes for microbes to perform specific functions.

Signup and view all the flashcards

What is CLUSTAL-W?

An app that allows the identification of predicted proteins that contain similar sequences.

Signup and view all the flashcards

Genes changing during evolution

Allows study of how protein sequences encoded by conserved genes have changed during evolution.

Signup and view all the flashcards

Study Notes

Genomics and Bioinformatics Overview

  • Genomics involves studying all information within a genome.
  • Bioinformatics is a new area of genetics developed to analyze sequence information from genomics.
  • Advances in sequencing technology enable the rapid sequencing of genomes.
  • Bioinformatics uses computational techniques to organize, share, and analyze genomic information.

Genomics Focus

  • Seeks to identify genome organization, including the number and arrangement of genes, and the role of non-coding DNA.
  • Aims to identify similarities and differences between genomes of various species and individual humans.
  • Led to the development of computational tools for analyzing large amounts of information, i.e., Bioinformatics

Bioinformatics Applications

  • Involves the compilation and stitching together of genomic sequences to create complete genome sequences.
  • Used for comparing gene sequences between species and identifying genes in genomic sequences.
  • Aids in predicting amino acid sequences of potential proteins encoded by genes.
  • Enables analysis of protein structure and prediction of protein functions.
  • Helps in finding gene regulatory regions such as promoters and enhancers.
  • Used to deduce evolutionary relationships between genes and organisms, and to identify where and when genes are expressed.

Genome Sequencing

  • Genomic DNA is cut with different restriction enzymes to create overlapping fragments.
  • Computer programs align overlapping sequenced fragments to assemble an entire chromosome.
  • Alignment of fragments based on identical DNA sequences creates contigs.
  • Software is used to find sequence overlaps in the fragments, and this is used to generate a full sequence from all fragments.

Shotgun Sequencing

  • Whole Genome Shotgun Sequencing Method
  • Genomic DNA is fragmented.
  • Each fragment is sequenced.
  • Align Contiguous Sequences
  • A finished sequence is generated.

Genome Analysis

  • A major aim of genomics is to identify the protein coding genes present in the genome.
  • Genes can be identified by comparing sequences between species.
  • BLAST (Basic Local Alignment Search Tool) can be used to perform this comparison.

Annotation of the Genome

  • Programs have been developed to locate protein coding genes within genomes.
  • Specific DNA sequences associated with genes are: TATA box (TATA(A/T)A(A/T)), CAAT box, translation initiation sites, splice sites, exons, introns, stop codon (ATG) and poly A addition (AATAAA) sites.
  • Sequencing of cDNAs helps identify protein-coding genes and the location of exons.

Genomic Sequences and Comparisons

  • A wide variety of genomes have been sequenced, and the sequence information is available in public databases.
  • This allows for comparisons of genome size, number of genes and similarity to human genomes.
  • Conserved genes can be identified and their evolution examined.
  • Homologues of human genetic disease genes can be identified in other species.
  • Roughly 50,000 species sequenced from https://www.ensembl.org/info/about/species.html

Protein Sequence Prediction

  • The amino acid sequence of proteins can be predicted when protein coding genes have been identified, because the triplet code is known.
  • It is possible to predict the order of amino acids of certain human growth hormones

Prediction of Protein Function

  • Predicting the function of proteins is possible once related, protein coding genes have been identified.
  • Previous work has identified proteins with particular functions, e.g., kinases, transcription factors.
  • Knowing the sequence of these genes has allowed for the identification of amino acids characteristic of protein function.
  • Proteins can then be searched for based on the characteristic features

Gene Families

  • CLUSTAL-W allows the identification of predicted proteins that contain similar sequences, identifying gene families within species that have similar functions.
  • Some genes are present in multiple copies within a species.
  • Multiple sodium channel (SCN) proteins are encoded in the human genome.

Conserved Genes

  • Investigating whether protein sequences are conserved between species can be performed once genes and predicted protein sequences have been identified.
  • This can be performed using CLUSTAL-W which can be used to identify functional regions in proteins.
  • Also allows study of how protein sequences encoded by conserved genes have changed during evolution

Proteins Encoded by Genes

  • Comparison of sequences allows prediction of the function for the majority of the proteins encoded by human genes.
  • However, the function of just over 40% of human genes is still unknown

Mapping Genes

  • Characterization of the human genome sequence allows the mapping of genes to each of the chromosomes.
  • Possible to locate the position of genes coding for specific protein sequences and the genes associated with human genetic disease

Protein Structure

  • Protein structure can be identified using an X-ray diffraction pattern generated from protein crystals or using prediction software.
  • Predict protein 3D structure from amino acid sequence
  • Sequence -> secondary structure -> 3D structure -> function

Deepmind - Alphafold

  • Biochemists teamed up with computational scientists to improve protein structure prediction, forming a consortium called Deepmind in collaboration with Google.
  • Used Al and machine learning to develop a program, Alphafold, that can predict protein structures
  • This is a huge advancement, as we can now predict sites in proteins where drugs could bind

Human Genome Sequence

  • Was completed in 2003.
  • Original human genome was sequenced from samples combined from a number of individuals.
  • Sequences both by publicly funded consortium and a private company, Celera Genomics
  • Identified the human genome contains 3 billion nucleotides
  • 2% is protein coding sequences, while the other 98% is non-coding.
  • The human genome contains 20,000 - 30,000 protein coding genes.
  • A gene can often undergo alternative splicing therefore produce more than 1 protein

Databases

  • Massive amount of sequence information is held on public databases
  • The largest set of databases is held at The National Centre for Biotechnological Information (NCBI)
  • NCBI holds databases of genomic DNA sequences, identified genes/proteins & genes are associated with human disease.
  • NCBI also holds a database of cDNA sequences.
  • By comparing cDNA sequences with the genomic sequence it is possible to identify the location of exons and introns.
  • These cDNA sequences only contain exons which allows identification of gene regions showing alternative splicing.

Databases - NCBI

  • NCBI’s Genome Browser enables viewing gene organization and identifying alternative splicing.
  • Example: human growth hormone.

ENCODE Project

  • ENCODE (Encyclopedia of DNA Elements) was set up to identify all elements in the genome not coding genes.
  • ENCODE identified promoter and enhancer sequences.
  • The project found that the genome contains regions of repetitive DNA that can vary between individuals.
  • ENCODE also analysed all regions transcribed, identifying that most regions are transcribed into RNA, even if they do not encode proteins.
  • Resulted in realization of the functional activities of RNAs and led to field of transcriptomics. Transcriptomics studies all genes expressed in different cell types and how expression changes given disease conditions.

1000 Genomes Project

  • It followed original human genome sequence.
  • Genomes of 1092 humans sequenced from different populations to identify small number of genetic differences
  • Individual's genomes are 99.9% the same, but individuals have differences at single nucleotides spread across the genome known as SNPs (single nucleotide polymorphisms).
  • Individuals also have variations in their repetitive DNA.

100,000 Genomes Project

  • Followed completion of the 1000 genomes Project. Began by Genomics England in 2013 and was funded by the NHS.
  • The project aimed to sequence 100,000 genomes of patients affected by rare disease to find genetic differences that lead to diseases
  • Completed in 2018, this has allows identification of genetic differences in these patients.
  • Genomics England expects the NHS will be first healthcare system to diagnose human disease by examining genomic sequence of the affected patient.
  • This approach is known as personalized medicine, since patients are treated based on their DNA sequence.

Cost of Sequencing

  • Cost of sequencing genomes has decreased exponentially from $10,000 in year 2000 to ~$1 in the current year/present day.
  • Cost drop has been due to automated techniques

Personal Genomics Services

  • The cost of genome sequencing and analysis has decreased leading to many companies now offering personalized genome services.
  • Companies like 23andMe and AncestryDNA analyze genomes, compare sequences to populations, assist in analysis of ancestors and report on ancestry.
  • 23 and me also offers reports on risk of disease.

Ancient DNA

  • Researchers are also investigating the genomes of extinct species to help us learn more on evolution
  • DNA can be extracted (esp. frozen sample) from bone and hair of extinct species (tens/hundreds of thousands years ago) and can be sequenced.
  • The genomes of mammoths, cave bears, and ancient fish have already been sequenced
  • Mummified remains and Neanderthals are also possible to have Ancient DNA
  • Neanderthals are our closest extinct human relative.
  • By analyzing Neanderthal DNA, we can study how modern humans evolved, including adaptation.
  • These genes can be linked to acquisition of language.
  • Small regions of Neanderthal DNA remain in the genomes of modern humans, and 23andMe can provide a report on the amount of Neanderthal DNA that you have in your genome.

Microbiome

  • 600 to 1000 species of microorganisms are estimated to live on humans, primarily in the digestive tract.
  • Microbiomes are the specific sets of microorganisms found on individuals.
  • Microbiomes can be identified via sequencing of an individual.
  • Differ between each individuals even though each individual has a generally constant personal microbiome

Changes to Microbiome

  • Changes can occur in the microbiomes of individuals suffering from illnesses, e.g. IBS or acne.
  • New therapies involve microbiome transplants from healthy patients.

Creating Genomes

  • Synthetic genomes can address what the minimum number of genes necessary for a cell and if genomes can be synthesized.
  • The JC Venter Institute demonstrated that 473 genes are sufficient in a microorganism.
  • DNA with 473 genes was generated for a bacterial cell, and it was found the cell survived and grew which was found to be the key organism for new genes and synthetic biology.
  • These genes synthesized in synthetic genomes could allow certain microbes to degrade pollutants, express proteins or synthesize biofuels

Omics

  • Genomics is study of all genes in genome, its an advance in classic genetics because it considers entire genome rather than individual genes
  • Genomics has led to the development of:
    • Transcriptomics – study of all expressed genes in a cell.
    • Proteomics – Study of any proteins in cell.
    • Metabolomics – study of all proteins and enzymes for metabolism.
    • Glycomics – Study of every carbohydrate in cell/ the carbohydrate-associated omics.
    • Metagenomics - Analysis of genomes from entire environmental community.,

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser