Bioinformatics: Genomics and Sequence Data

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the function of bioinformatics?

  • Using computers to collect, analyse and store biological data (correct)
  • Storing biological data only
  • Analysing biological data manually
  • Collecting biological data only

The Human Genome Project (HGP) aimed to identify all the proteins in the human body.

False (B)

What is the primary goal of genome sequencing?

to determine the complete nucleotide sequence of an organism's DNA

The process of attaching biological information to sequences is known as __________.

<p>annotation</p>
Signup and view all the answers

Match the following bioinformatics terms with their descriptions:

<p>Genome = All the genetic material in an organism Genomics = The study of genes and their function Annotation = Attaching biological information to sequences Sequence assembly = Process of piecing together DNA fragments</p>
Signup and view all the answers

Which component is NOT a building block of DNA?

<p>Uracil (B)</p>
Signup and view all the answers

Bioinformatics plays no role in the analysis of the human genome.

<p>False (B)</p>
Signup and view all the answers

What type of information is stored in databases used in bioinformatics?

<p>sequence data</p>
Signup and view all the answers

In the context of bioinformatics, a sequence in FASTA format has a single line description beginning with > and a [_______] at the end.

<p>return</p>
Signup and view all the answers

Match the database with the type of data it primarily stores:

<p>GenBank = Nucleic acid sequences UniProt = Protein sequences PubMed = Journal articles</p>
Signup and view all the answers

What is the approximate size of the human genome?

<p>3 billion base pairs (B)</p>
Signup and view all the answers

The 'open reading frame' (ORF) begins with a termination codon during the translation of DNA.

<p>False (B)</p>
Signup and view all the answers

Explain what 'sequence assembly' means in the context of bioinformatics.

<p>putting together the fragment overlaps</p>
Signup and view all the answers

Tools used to identify the regions that encode genes are called __________ tools.

<p>gene prediction</p>
Signup and view all the answers

Match the potential applications with the genomics advancements:

<p>Molecular medicine = Develop more rapid and specific tests Evolution studies = Compares modern DNA with ancient samples Forensic science = Advances in DNA profiling</p>
Signup and view all the answers

Which of the following is NOT a potential benefit for human health from the HGP?

<p>Bioarchaeology (D)</p>
Signup and view all the answers

Using bioinformatics can provide answers faster in in vitro than in in silico.

<p>False (B)</p>
Signup and view all the answers

What is the other name for similarity searching?

<p>similarity</p>
Signup and view all the answers

Databases that are interconnected and host DNA and proteins also hosts __________.

<p>journal articles</p>
Signup and view all the answers

Match the description of the database that contains complete and reference proteome sets:

<p>Protein Knowledgebase (UniProtKB) = Manually annotated and reviewed Sequence clusters (UniRef) = Speed up sequence similarity searches Sequence archive (UniParc) = Keep track of sequences</p>
Signup and view all the answers

How does the genome sequencing process work?

<p>DNA sequencers generate short overlapping sequences (C)</p>
Signup and view all the answers

Assigning protein function is a minor challenge of Bioinformatics.

<p>False (B)</p>
Signup and view all the answers

What are the main types of species for genomics categorisation?

<p>archaea, bacteria, eukaryote, virus, metagen</p>
Signup and view all the answers

Entrez allows __________ serches of NCBI and PubMed.

<p>parallel</p>
Signup and view all the answers

Match the genomic regions to their description:

<p>Protein domains = Evolutionarily conserved Proteins = Similar in sequence across several species</p>
Signup and view all the answers

Why is bioinformatics useful?

<p>It handles huge datasets (A)</p>
Signup and view all the answers

Gene prediction tools does not include promoter sites and translation start sites.

<p>False (B)</p>
Signup and view all the answers

What is the purpose of Annotation in the bioinformatics challenges of genome sequencing?

<p>attaching biological information to sequences</p>
Signup and view all the answers

There are more than __________ completed genomes.

<p>36000</p>
Signup and view all the answers

Match the description with correct genomics process:

<p>HGP = Sequences hundreds of thousands of overlapping sequencing fragments Bioinformatics = Essential because of the large amount of data</p>
Signup and view all the answers

What are benefits of genomic information being stored, to be easily retrievable?

<p>Phenotypes of mutations, protein domains, protein function and gene names (C)</p>
Signup and view all the answers

Forensic science cannot establish paternity and other family relationships.

<p>False (B)</p>
Signup and view all the answers

What is a crucial for for data in FASTA?

<p>consistency</p>
Signup and view all the answers

For the translation from DNA, the frame +1 doesn`t have __________ stop codons.

<p>internal</p>
Signup and view all the answers

Match the use for Database search

<p>Google = simple web query PubMed = search option using boolean operators (&quot;AND&quot; / &quot;OR&quot; searches)</p>
Signup and view all the answers

Which of the nucleotides are not in DNA?

<p>Uracil (A)</p>
Signup and view all the answers

The complete sequence of the human DNA is not the goal of the human genome sequencing.

<p>False (B)</p>
Signup and view all the answers

Which year was FASTA format proposed?

<p>1985</p>
Signup and view all the answers

GeneBank is a database from _________.

<p>nih</p>
Signup and view all the answers

What do dbEST and dbGSS databases store?

<p>dbEST = Expressed Sequence Tags dbGSS = Genome Survey Sequences</p>
Signup and view all the answers

What occurs 30bp upstream from the TATA start site (ATG encodes the first Methionine)?

<p>TATA (C)</p>
Signup and view all the answers

The protein domains are not evolutionarily conserved.

<p>False (B)</p>
Signup and view all the answers

About how many genes did the HGP identify?

<p>20000-25000</p>
Signup and view all the answers

Flashcards

Bioinformatics

The use of computers to collect, analyse and store biological data, bringing biological meaning to sequence data.

Genomics

The study of genes and their function.

Genome

All the genetic material in the chromosomes of a particular organism.

Human Genome Project (HGP)

A project to determine the complete sequence of the human genome, identify all the genes, and store this information.

Signup and view all the flashcards

Molecular Medicine

Searching for genes associated with genetic diseases, developing rapid diagnostic tests, and treating defective genes via gene therapy.

Signup and view all the flashcards

Bioarchaeology in Genomics

Using DNA to understand human evolution and ancient migration patterns.

Signup and view all the flashcards

Forensic science utilizing DNA profiling

Using DNA analysis to identify suspects, exonerate wrongly accused persons, identify victims, establish paternity, and match organ donors.

Signup and view all the flashcards

Sequence assembly

Overlapping sequences are assembled to produce one.

Signup and view all the flashcards

Gene Prediction tools

Tools that look for sequence features at defined sites within eukaryotic genes for specific roles during transcription/translation.

Signup and view all the flashcards

Annotation

The process of attaching biological information to sequences, including location of ORFs, gene structure, regulatory elements and biochemical function.

Signup and view all the flashcards

Open Reading Frame (ORF)

The translated region/frame of DNA that does not contain any internal stop codons.

Signup and view all the flashcards

Similarity searching

Looking for similar sequences with known function using the translated protein sequence.

Signup and view all the flashcards

Biological Databases

Systems to store, manage, and retrieve sequence and other biological information, incorporating data on different species, gene locations, protein structure/function, etc.

Signup and view all the flashcards

FASTA format

A format for sequence databases with a single line description beginning with >.

Signup and view all the flashcards

GenBank

The major archive of nucleic acid sequences maintained by the US National Center for Biotechnology Information (NCBI).

Signup and view all the flashcards

UniProt

A major archive of protein sequences (largely comprised of translated gene sequences).

Signup and view all the flashcards

NCBI Database system

An interconnected database system hosted by the NCBI that includes journal articles, genetic diseases, polymorphisms, gene expression, etc.

Signup and view all the flashcards

ENTREZ

Allows for many searches with parallel results.

Signup and view all the flashcards

Study Notes

  • Bioinformatics' core goal is to impart biological relevance to sequence data.

Bioinformatics Origins and Definition

  • Bioinformatics is derived from "Bio," signifying biology, and "Informatique," the French term for "data processing."
  • Bioinformatics involves using computers to gather, analyze, and store biological information.

The Genomics Revolution

  • DNA consists of adenine (A), guanine (G), cytosine (C), and thymine (T).
  • A genome encompasses all of the genetic material found within an organism's chromosomes.
  • Genomics involves scrutinizing genes and their roles.

The Human Genome Project (HGP)

  • The HGP aimed to determine the complete sequence of approximately 3 billion base pairs (bp).
  • The HGP sought to pinpoint all genes (roughly 20,000 to 25,000).
  • The HGP aimed to store the information and create accessibility for study, and was completed in 2003.

Potential Benefits of Genomic Sequencing

  • Molecular medicine could benefit from identifying genes tied to genetic disorders like Alzheimer's and certain cancers.
  • Rapid and specific diagnostic tests can be developed.
  • Gene therapy is a possibility to treat defective genes.
  • DNA analysis of a 5,300-year-old corpse discovered in 1991 had its genome sequenced and published in 2012.
  • Examination of the iceman revealed he likely had brown eyes, was blood group O, and was lactose intolerant.
  • DNA profiling advancements enable suspect identification from crime scene DNA, exoneration of the wrongly accused, identifying victims of war and catastrophes, establishing paternity, and matching organ donors with recipients.

Bioinformatics is essential for managing large datasets

  • There are over 36,000 completed genomes, with thousands more in progress thanks to bioinformatics support.
  • Bioinformatics facilitates storage analysis, and comparison of the datasets.
  • The human genome, comprising 3 billion bp, can be processed because of bioinformatics.
  • The human genome could fill 200 telephone books and would take 9.5 years to read aloud.

Bioinformatics Applications

  • Bioinformatics can provide answers more quickly
  • Bioinformatics uses in silico analysis faster than in vitro methods.
  • Results generated in silico still require validation in the lab.

Bioinformatics Sequence Assembly

  • DNA sequencers produce overlapping sequences of about 500bp.
  • Sequence fragments must be assembled correctly along the chromosome.
  • Sequence assembly leads to a consensus DNA sequence.
  • During the HGP, bioinformatics tools assembled hundreds of thousands of overlapping sequence fragments.

Bioinformatics and Gene Finding

  • Less than 2% of the human genome codes for proteins.
  • Gene prediction tools search for sequence features that appear at particular sites within eukaryotic genes to complete specific roles during transcription and translation.

Eukaryotic Genes

  • Eukaryotic gene structures have TATA boxes that occur 30 bp upstream from the start site
  • Translation start sites (ATG encodes the first Methionine)
  • The Eukaryotic gene structures have Intron/Exon splice sites (GT....AG)
  • Termination codon (TGA, TAA or TAG).

Protein Sequence Translation

  • The initial step is to translate DNA into a theoretical protein sequence.
  • Then to locate the open reading frame (ORF), beginning with the start codon (ATG).

Protein Sequence Analysis

  • Translated protein sequences aid in spotting similar sequences with known functions.
  • Protein domains tend to remain conserved during evolution.
  • Proteins sharing sequence similarities across species likely share similar functions.
  • An ongoing bioinformatics challenge is assigning protein function.

Genome Annotation

  • Genome Annotation involves adding biological information to sequences, which serves as a valuable reference for scientists studying gene function.
  • Protein annotation includes location of ORFs, gene structure, regulatory elements, biochemical function, conserved domains, and protein interactions.

Data Storage and Databases

  • Scientists generate significant quantities of data.
  • Multiple kinds of data should be stored including various species, gene names and locations, mutation phenotypes, protein structures and functions, protein domains and interactions.

Data Handling

  • Sequence databases require consistency in the way data is entered
  • W.R. Pearson introduced the FAST Alignment (FASTA) format in 1985.
  • The FASTA format is widely used for reading and reporting sequence data.
  • In FASTA format, lines begins with > followed by a return.

FASTA Format Explained

  • gi|386828 represents the geninfo number, assigned by the NCBI to each sequence entry.
  • gb|AAA59172.1 reveals that GenBank was the source database.
  • gb|AAA59172.1 shows database accession number AAA59172.1.

Sequence resources

  • GenBank is a major archive of nucleic acid sequences maintained by the US National Center for Biotechnology Information (NCBI).
  • Over 135 million sequences have been deposited into GenBank.
  • Each sequence entry in GenBank represents a unique entry.
  • Accession numbers are unique identifiers for a single sequence.
  • Accession numbers serve as the best method for finding specific entries in sequence databases.
  • UniProt is a major archive of protein sequences largely derived from translated gene sequences.
  • NCBI hosts an interconnected database system including journal articles (PubMed), genetic diseases (OMIM), polymorphisms (SNP), and gene expression (GEO).

NCBI Entrez

  • NCBI offers ENTREZ for parallel searches through multiple data archives.

Database Searching

  • Most search engines offer a simple web query, but advanced search using Boolean operators (“AND” / “OR”) are also available for a complex option.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Human Genome Project Quiz
8 questions

Human Genome Project Quiz

DivineProbability7150 avatar
DivineProbability7150
Human Genome Project (HGP)
20 questions

Human Genome Project (HGP)

UnparalleledMedusa5347 avatar
UnparalleledMedusa5347
Human Genome Project History
25 questions
Use Quizgecko on...
Browser
Browser