Podcast
Questions and Answers
Which of the following best describes the function of bioinformatics?
Which of the following best describes the function of bioinformatics?
- Using computers to collect, analyse and store biological data (correct)
- Storing biological data only
- Analysing biological data manually
- Collecting biological data only
The Human Genome Project (HGP) aimed to identify all the proteins in the human body.
The Human Genome Project (HGP) aimed to identify all the proteins in the human body.
False (B)
What is the primary goal of genome sequencing?
What is the primary goal of genome sequencing?
to determine the complete nucleotide sequence of an organism's DNA
The process of attaching biological information to sequences is known as __________.
The process of attaching biological information to sequences is known as __________.
Match the following bioinformatics terms with their descriptions:
Match the following bioinformatics terms with their descriptions:
Which component is NOT a building block of DNA?
Which component is NOT a building block of DNA?
Bioinformatics plays no role in the analysis of the human genome.
Bioinformatics plays no role in the analysis of the human genome.
What type of information is stored in databases used in bioinformatics?
What type of information is stored in databases used in bioinformatics?
In the context of bioinformatics, a sequence in FASTA format has a single line description beginning with > and a [_______] at the end.
In the context of bioinformatics, a sequence in FASTA format has a single line description beginning with > and a [_______] at the end.
Match the database with the type of data it primarily stores:
Match the database with the type of data it primarily stores:
What is the approximate size of the human genome?
What is the approximate size of the human genome?
The 'open reading frame' (ORF) begins with a termination codon during the translation of DNA.
The 'open reading frame' (ORF) begins with a termination codon during the translation of DNA.
Explain what 'sequence assembly' means in the context of bioinformatics.
Explain what 'sequence assembly' means in the context of bioinformatics.
Tools used to identify the regions that encode genes are called __________ tools.
Tools used to identify the regions that encode genes are called __________ tools.
Match the potential applications with the genomics advancements:
Match the potential applications with the genomics advancements:
Which of the following is NOT a potential benefit for human health from the HGP?
Which of the following is NOT a potential benefit for human health from the HGP?
Using bioinformatics can provide answers faster in in vitro than in in silico.
Using bioinformatics can provide answers faster in in vitro than in in silico.
What is the other name for similarity searching?
What is the other name for similarity searching?
Databases that are interconnected and host DNA and proteins also hosts __________.
Databases that are interconnected and host DNA and proteins also hosts __________.
Match the description of the database that contains complete and reference proteome sets:
Match the description of the database that contains complete and reference proteome sets:
How does the genome sequencing process work?
How does the genome sequencing process work?
Assigning protein function is a minor challenge of Bioinformatics.
Assigning protein function is a minor challenge of Bioinformatics.
What are the main types of species for genomics categorisation?
What are the main types of species for genomics categorisation?
Entrez allows __________ serches of NCBI and PubMed.
Entrez allows __________ serches of NCBI and PubMed.
Match the genomic regions to their description:
Match the genomic regions to their description:
Why is bioinformatics useful?
Why is bioinformatics useful?
Gene prediction tools does not include promoter sites and translation start sites.
Gene prediction tools does not include promoter sites and translation start sites.
What is the purpose of Annotation in the bioinformatics challenges of genome sequencing?
What is the purpose of Annotation in the bioinformatics challenges of genome sequencing?
There are more than __________ completed genomes.
There are more than __________ completed genomes.
Match the description with correct genomics process:
Match the description with correct genomics process:
What are benefits of genomic information being stored, to be easily retrievable?
What are benefits of genomic information being stored, to be easily retrievable?
Forensic science cannot establish paternity and other family relationships.
Forensic science cannot establish paternity and other family relationships.
What is a crucial for for data in FASTA?
What is a crucial for for data in FASTA?
For the translation from DNA, the frame +1 doesn`t have __________ stop codons.
For the translation from DNA, the frame +1 doesn`t have __________ stop codons.
Match the use for Database search
Match the use for Database search
Which of the nucleotides are not in DNA?
Which of the nucleotides are not in DNA?
The complete sequence of the human DNA is not the goal of the human genome sequencing.
The complete sequence of the human DNA is not the goal of the human genome sequencing.
Which year was FASTA format proposed?
Which year was FASTA format proposed?
GeneBank is a database from _________.
GeneBank is a database from _________.
What do dbEST and dbGSS databases store?
What do dbEST and dbGSS databases store?
What occurs 30bp upstream from the TATA start site (ATG encodes the first Methionine)?
What occurs 30bp upstream from the TATA start site (ATG encodes the first Methionine)?
The protein domains are not evolutionarily conserved.
The protein domains are not evolutionarily conserved.
About how many genes did the HGP identify?
About how many genes did the HGP identify?
Flashcards
Bioinformatics
Bioinformatics
The use of computers to collect, analyse and store biological data, bringing biological meaning to sequence data.
Genomics
Genomics
The study of genes and their function.
Genome
Genome
All the genetic material in the chromosomes of a particular organism.
Human Genome Project (HGP)
Human Genome Project (HGP)
Signup and view all the flashcards
Molecular Medicine
Molecular Medicine
Signup and view all the flashcards
Bioarchaeology in Genomics
Bioarchaeology in Genomics
Signup and view all the flashcards
Forensic science utilizing DNA profiling
Forensic science utilizing DNA profiling
Signup and view all the flashcards
Sequence assembly
Sequence assembly
Signup and view all the flashcards
Gene Prediction tools
Gene Prediction tools
Signup and view all the flashcards
Annotation
Annotation
Signup and view all the flashcards
Open Reading Frame (ORF)
Open Reading Frame (ORF)
Signup and view all the flashcards
Similarity searching
Similarity searching
Signup and view all the flashcards
Biological Databases
Biological Databases
Signup and view all the flashcards
FASTA format
FASTA format
Signup and view all the flashcards
GenBank
GenBank
Signup and view all the flashcards
UniProt
UniProt
Signup and view all the flashcards
NCBI Database system
NCBI Database system
Signup and view all the flashcards
ENTREZ
ENTREZ
Signup and view all the flashcards
Study Notes
- Bioinformatics' core goal is to impart biological relevance to sequence data.
Bioinformatics Origins and Definition
- Bioinformatics is derived from "Bio," signifying biology, and "Informatique," the French term for "data processing."
- Bioinformatics involves using computers to gather, analyze, and store biological information.
The Genomics Revolution
- DNA consists of adenine (A), guanine (G), cytosine (C), and thymine (T).
- A genome encompasses all of the genetic material found within an organism's chromosomes.
- Genomics involves scrutinizing genes and their roles.
The Human Genome Project (HGP)
- The HGP aimed to determine the complete sequence of approximately 3 billion base pairs (bp).
- The HGP sought to pinpoint all genes (roughly 20,000 to 25,000).
- The HGP aimed to store the information and create accessibility for study, and was completed in 2003.
Potential Benefits of Genomic Sequencing
- Molecular medicine could benefit from identifying genes tied to genetic disorders like Alzheimer's and certain cancers.
- Rapid and specific diagnostic tests can be developed.
- Gene therapy is a possibility to treat defective genes.
- DNA analysis of a 5,300-year-old corpse discovered in 1991 had its genome sequenced and published in 2012.
- Examination of the iceman revealed he likely had brown eyes, was blood group O, and was lactose intolerant.
- DNA profiling advancements enable suspect identification from crime scene DNA, exoneration of the wrongly accused, identifying victims of war and catastrophes, establishing paternity, and matching organ donors with recipients.
Bioinformatics is essential for managing large datasets
- There are over 36,000 completed genomes, with thousands more in progress thanks to bioinformatics support.
- Bioinformatics facilitates storage analysis, and comparison of the datasets.
- The human genome, comprising 3 billion bp, can be processed because of bioinformatics.
- The human genome could fill 200 telephone books and would take 9.5 years to read aloud.
Bioinformatics Applications
- Bioinformatics can provide answers more quickly
- Bioinformatics uses in silico analysis faster than in vitro methods.
- Results generated in silico still require validation in the lab.
Bioinformatics Sequence Assembly
- DNA sequencers produce overlapping sequences of about 500bp.
- Sequence fragments must be assembled correctly along the chromosome.
- Sequence assembly leads to a consensus DNA sequence.
- During the HGP, bioinformatics tools assembled hundreds of thousands of overlapping sequence fragments.
Bioinformatics and Gene Finding
- Less than 2% of the human genome codes for proteins.
- Gene prediction tools search for sequence features that appear at particular sites within eukaryotic genes to complete specific roles during transcription and translation.
Eukaryotic Genes
- Eukaryotic gene structures have TATA boxes that occur 30 bp upstream from the start site
- Translation start sites (ATG encodes the first Methionine)
- The Eukaryotic gene structures have Intron/Exon splice sites (GT....AG)
- Termination codon (TGA, TAA or TAG).
Protein Sequence Translation
- The initial step is to translate DNA into a theoretical protein sequence.
- Then to locate the open reading frame (ORF), beginning with the start codon (ATG).
Protein Sequence Analysis
- Translated protein sequences aid in spotting similar sequences with known functions.
- Protein domains tend to remain conserved during evolution.
- Proteins sharing sequence similarities across species likely share similar functions.
- An ongoing bioinformatics challenge is assigning protein function.
Genome Annotation
- Genome Annotation involves adding biological information to sequences, which serves as a valuable reference for scientists studying gene function.
- Protein annotation includes location of ORFs, gene structure, regulatory elements, biochemical function, conserved domains, and protein interactions.
Data Storage and Databases
- Scientists generate significant quantities of data.
- Multiple kinds of data should be stored including various species, gene names and locations, mutation phenotypes, protein structures and functions, protein domains and interactions.
Data Handling
- Sequence databases require consistency in the way data is entered
- W.R. Pearson introduced the FAST Alignment (FASTA) format in 1985.
- The FASTA format is widely used for reading and reporting sequence data.
- In FASTA format, lines begins with > followed by a return.
FASTA Format Explained
- gi|386828 represents the geninfo number, assigned by the NCBI to each sequence entry.
- gb|AAA59172.1 reveals that GenBank was the source database.
- gb|AAA59172.1 shows database accession number AAA59172.1.
Sequence resources
- GenBank is a major archive of nucleic acid sequences maintained by the US National Center for Biotechnology Information (NCBI).
- Over 135 million sequences have been deposited into GenBank.
- Each sequence entry in GenBank represents a unique entry.
- Accession numbers are unique identifiers for a single sequence.
- Accession numbers serve as the best method for finding specific entries in sequence databases.
- UniProt is a major archive of protein sequences largely derived from translated gene sequences.
- NCBI hosts an interconnected database system including journal articles (PubMed), genetic diseases (OMIM), polymorphisms (SNP), and gene expression (GEO).
NCBI Entrez
- NCBI offers ENTREZ for parallel searches through multiple data archives.
Database Searching
- Most search engines offer a simple web query, but advanced search using Boolean operators (“AND” / “OR”) are also available for a complex option.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.