Introduction to Bioinformatics

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which activity is considered a fundamental aspect of bioinformatics?

  • Developing new laboratory techniques for synthesizing proteins.
  • Creating new statistical models for climate prediction.
  • Analyzing DNA and protein sequences using various programs and databases. (correct)
  • Designing computer hardware for biological research.

What role does bioinformatics play in the context of biological education?

  • Replacing traditional lab experiments with computer simulations.
  • Restricting biological research to computational methods only.
  • Helping biologists effectively access databases and use analysis tools. (correct)
  • Discouraging interdisciplinary collaboration in biological studies.

Which task relies on bioinformatics for its execution?

  • Traditional microscope-based cell counting.
  • Culturing cells in a petri dish.
  • Analysis of gene variation and expression. (correct)
  • Manual sorting of bacterial colonies on agar plates.

How does bioinformatics contribute to clinical applications?

<p>Whole genome sequencing to produce lists of human gene products for new drugs (D)</p> Signup and view all the answers

Which of the following is an application of bioinformatics?

<p>Sequence mapping of biomolecules. (C)</p> Signup and view all the answers

What is a key application of bioinformatics in molecular biology?

<p>Molecular modeling of biomolecules. (B)</p> Signup and view all the answers

What is genomics primarily concerned with, as opposed to proteomics?

<p>The study of the entire set of genes in the genome of a cell. (D)</p> Signup and view all the answers

How do genomics and proteomics differ in terms of the 'unit under study'?

<p>Genomics studies genes, while proteomics studies proteomes. (C)</p> Signup and view all the answers

What role do high-throughput techniques play in genomics?

<p>To map, sequence, and analyze genomes. (C)</p> Signup and view all the answers

What is a key difference in the nature of study material between genomics and proteomics?

<p>Genomic material is constant, while proteomic material is dynamic. (A)</p> Signup and view all the answers

What practical benefit do proteomic studies offer over genomic studies in understanding cell conditions?

<p>Proteomic studies can directly represent actual conditions within cells. (C)</p> Signup and view all the answers

Which type of genomics focuses on predicting the functions of proteins?

<p>Structural genomics. (B)</p> Signup and view all the answers

Which bioinformatics resource focuses on protein families, motifs, and domains?

<p>InterPro. (C)</p> Signup and view all the answers

Which database is a primary nucleotide sequence database located in Japan?

<p>DDBJ (C)</p> Signup and view all the answers

What is the most accurate description of a primary biological database?

<p>A database populated with experimentally derived data, such as nucleotide sequences. (D)</p> Signup and view all the answers

How do secondary databases differ from primary databases?

<p>They are derived from analyzing primary data. (C)</p> Signup and view all the answers

What is the primary function of a biological database?

<p>To organize data in a set of structured records for easy retrieval. (A)</p> Signup and view all the answers

What is the main advantage of using multiple databases in protein studies?

<p>It helps researchers understand the structure and function of a protein. (A)</p> Signup and view all the answers

Which element is a part of the annotation fields in sequence databases?

<p>Function of the protein. (A)</p> Signup and view all the answers

What is the 'alignment score' in the context of pairwise sequence alignments?

<p>The sum of substitution and indel scores for all columns in the alignment. (D)</p> Signup and view all the answers

Why is sequence alignment considered a powerful tool?

<p>It captures evolutionary descent and structural functions. (D)</p> Signup and view all the answers

What is the role of 'null characters' in sequence alignment?

<p>They signify an absent letter in a sequence. (B)</p> Signup and view all the answers

What do FASTA and BLAST primarily achieve?

<p>They are used to identify homologous DNA sequences and proteins. (B)</p> Signup and view all the answers

How does FASTA find matching sequences?

<p>A heuristic word method for fast pairwise sequence alignment. (C)</p> Signup and view all the answers

What is meant by 'k-tuples or k-tuplet' in FASTA?

<p>A short, identical string of residues with length k in the sequence. (B)</p> Signup and view all the answers

How does BLAST differ from FASTA in sequence alignment?

<p>BLAST focuses on ungapped, locally optimal sequence alignments. (D)</p> Signup and view all the answers

What is a key utility of BLAST as a bioinformatics tool?

<p>Identifying regions of local similarity between two sequences quickly. (C)</p> Signup and view all the answers

Which of the following is a variant of BLAST used to compare nucleotide sequences against protein sequences?

<p>BLAST-X (C)</p> Signup and view all the answers

In the context of a phylogenetic tree, what do the tree's branches represent?

<p>Species or groups of interest. (A)</p> Signup and view all the answers

What is transcriptomics primarily concerned with?

<p>The complete set of mRNA transcripts produced by the genome. (C)</p> Signup and view all the answers

How does pharmacogenomics enhance drug safety?

<p>By predicting drug interactions with inherited genes to prevent adverse reactions. (D)</p> Signup and view all the answers

What is a key benefit of pharmacogenomics in healthcare?

<p>Finding appropriate medications and doses more quickly, thus improving healthcare efficiency. (C)</p> Signup and view all the answers

What are the initial steps in phylogenetic analysis?

<p>Isolate and acquire a set of homologous DNA or protein sequences. (A)</p> Signup and view all the answers

What does the term 'phylogeny' refer to?

<p>The evolutionary relationships among organisms. (B)</p> Signup and view all the answers

What does the genomic era provide?

<p>Storing and handling of information through the establishment and use of computer databases. (C)</p> Signup and view all the answers

The BLAST program was developed in 1990 by?

<p>Stephen Altschul of NCBI. (C)</p> Signup and view all the answers

Give an example of a protein databank:

<p>PDB (A)</p> Signup and view all the answers

Give an example of a nucleic acid database:

<p>Genbank (D)</p> Signup and view all the answers

SWISS-PROT is...

<p>A primary protein sequence database. (C)</p> Signup and view all the answers

The study of gene expression and the level of mRNA in a cell are components of:

<p>Transcriptomics. (D)</p> Signup and view all the answers

Flashcards

Bioinformatics

An interdisciplinary field involving molecular biology, genetics, computer science, mathematics, and statistics to manage and analyze biological data.

Genomics

The study of the entire set of genes in the genome of a cell or organism.

Proteomics

The study of all the proteins produced in a cell.

Genomics Definition

Genomics is the study of genomes, which refers to the complete set of genes or genetic material present in a cell or organism.

Signup and view all the flashcards

Proteomics Definition

Proteomics is the branch of molecular biology that studies the set of proteins expressed by the genome of an organism.

Signup and view all the flashcards

Genomics vs Proteomics Study

Genomics studies genes in an organism, while proteomics studies all the proteins in a cell.

Signup and view all the flashcards

Genomics vs Proteomics Function

Genomics studies the function of genomes, while proteomics studies the function of proteomes.

Signup and view all the flashcards

Primary Biological Databases

Databases containing experimentally derived data, like nucleotide sequences, protein sequences, or macromolecular structures.

Signup and view all the flashcards

Secondary biological databases

Databases that consist of data derived from analyzing primary data.

Signup and view all the flashcards

Importance of Biological Databases

Act as a storehouse of information, organizing data for easy retrieval, knowledge discovery and new biological insights.

Signup and view all the flashcards

Nucleotide Sequence Databases

Databases containing nucleotide sequences, which are a collection of sequences from multiple sources, like GenBank, RefSeq, TPA and PDB.

Signup and view all the flashcards

GenBank

Open access and annotated collection of all publicly available nucleotide sequences and their protein translations.

Signup and view all the flashcards

EMBL

Comprehensive collection of primary nucleotide sequences, maintained at the European Bioinformatics Institute (EBI).

Signup and view all the flashcards

Bioinformatics Software Tools

Software tools ranging from simple command-line tools to complex graphical programs, used in biological research.

Signup and view all the flashcards

Open-Source Bioinformatics Software

Open-source tools whose bases helped group contribution to bioinformatics, regardless of funding.

Signup and view all the flashcards

Web-Services in Bioinformatics

Collection of REST-based interfaces for bioinformatics apps, allowing use across the world.

Signup and view all the flashcards

Bioinformatics Workflow Management System

System designed to compose and execute a series of computational steps.

Signup and view all the flashcards

Pharmacogenomics

A type of genetic testing that looks for small variations within genes to determine if genes activate or deactivate specific drugs.

Signup and view all the flashcards

Phylogenetic Tree

A diagram that represents evolutionary relationships among organisms, that may highlight common ancestors.

Signup and view all the flashcards

Sequence Alignment

Alignment of letters from two or multiple sequences that suggests they are descended from a common ancestral sequence.

Signup and view all the flashcards

Column (in alignment)

The one-to-one correspondence of a single letter in one sequence with a single letter in the other.

Signup and view all the flashcards

Substitution (alignment)

A column that aligns two letters.

Signup and view all the flashcards

Indel (alignment)

A column that aligns letters with a null.

Signup and view all the flashcards

Alignment Score

The sum of substitution and indel scores of an alignment's columns.

Signup and view all the flashcards

Optimal Alignment

An alignment with maximum score.

Signup and view all the flashcards

BLAST and FASTA

BLAST and FASTA are software that identifies homologous DNA and protein sequences.

Signup and view all the flashcards

FASTA and BLAST

Software tools used in bioinformatics using a word method for sequence alignment.

Signup and view all the flashcards

BLAST Program

Program developed by Stephen Altschul to align a query sequence with all sequences in a database.

Signup and view all the flashcards

FASTA Definition

FASTA stands for fast-all' or 'FastA', another sequence alignment tool used to search similarities between DNA and proteins.

Signup and view all the flashcards

Protein Databank (PDB)

Database that holds data from X-ray crystallography, NMR, and molecular modelling.

Signup and view all the flashcards

Pharmacogenomics

It studies how medicine interacts with inherited genes, and how genes affect medicine

Signup and view all the flashcards

Study Notes

Bioinformatics

  • Bioinformatics is an emerging field applying computers to collect, organize, analyze, manipulate, present, and share biological data.
  • It is an interdisciplinary field that involves molecular biology, genetics, computer science, mathematics, and statistics.
  • A central part of bioinformatics involves designing and operating biologic databases effectively.
  • With large amounts of nucleotide and protein sequence data from research techniques are stored in biological databases, scientists use bioinformatics tools on computers to analyze biological data in daily research.
  • Bioinformatics helps biologists access databases and use analysis tools efficiently, becoming a vital part of biological education.
  • As an evolving discipline, bioinformatics uses complex software to retrieve, sort, analyze, predict, and store DNA and protein sequence data.
  • A fundamental activity is the sequence analysis of DNA and proteins using web-available programs and databases.
  • Pharmaceutical companies employ bioinformaticians to perform and maintain the extensive bioinformatics needs of these industries.
  • Besides genome sequence data analysis, it is also used for gene variation and expression analysis, and for predicting gene and protein structure and function.
  • Critical tasks like predicting and detecting gene regulation networks, and molecular pathways analyses for understanding gene-disease interactions use bioinformatics.
  • Bioinformatics has clinical applications that allow for genome sequencing allow for the production of human gene products.

Bioinformatics Applications

  • It has revolutionized advancements in biological science, including advancements and benefits to biotechnology
  • Human genome sequencing was completed in record time because of bioinformatics.
  • Bioinformatics applications:
  • Mapping biomolecules (DNA, RNA, proteins)
  • Identifying nucleotide sequences of functional genes.
  • Finding sites that can be cut by restriction enzymes.
  • Designing of primer sequence for polymerase chain reaction.
  • Predicting functional gene products.
  • Tracing the evolutionary trees of genes.
  • Predicting the 3-dimensional structure of proteins.
  • Molecular modeling of biomolecules.
  • Designing of drugs for medical treatment.
  • Processing large amounts of biological data
  • Developing models for cells, tissues, and organs functions.
  • Most fields in biological sciences rely on bioinformatics today.

Genomics vs. Proteomics

  • Genomics is the study of the entire set of genes in the genome of a cell
  • Proteomics is the study of the entire set of proteins produced by the cell.
  • Genomics studies genetic material present in a cell or organism, while Proteomics researches set of proteins expressed by an organism's genome.
  • Genomics studies the genes in an organism, while Proteomics studies all the proteins in a cell.
  • In genomics, the study unit is the function of genomes; in proteomics, it is the function of proteomes.
  • In genomics the genome is same for every cell, whereas in proteomics the proteome is dynamic, as protein production differs by tissue according to gene expression.
  • Genomics maps, sequences, and analyzes genomes using high-throughput techniques; proteomics uses these techniques to characterize the 3D structure and function of proteins.
  • Genomics techniques sequence strategies like directed gene sequencing, expressed sequence tags (ESTs), single nucleotide polymorphisms (SNPs), and the analysis of sequenced data using software and databases.
  • Proteomics techniques include extraction and electrophoretic separation of proteins, digestion of proteins with trypsin, amino acid sequencing by mass spectrometry, and using the information in protein databases.
  • Genomics involves structural and functional genomics, while proteomics involves structural, functional, and expression proteomics.
  • Genomics focuses on genome sequencing projects like the Human Genome Project
  • Proteomics focuses on proteome database developments like SWISS-2DPAGE, and software development for computer-aided drug design.
  • Genomics studies are important for understanding the structure, function, location, regulation of genes to study organisms.
  • Proteomics is important because the study of entire set of proteins produced by a cell type is done to understand its structure and function.
  • Genomics studies genes in the nucleus- Proteins are functional molecules and represent actual conditions, making proteomics studies more beneficial.

Databases

  • Databases are essential for bioinformatics research and applications.
  • Many databases exist with various information types, including DNA/protein sequences, molecular structures, phenotypes, and biodiversity.
  • Databases may contain empirical data (from experiments), predicted data (from analysis), or both.
  • Databases can be specific to an organism, pathway, molecule, or incorporate data from multiple databases.
  • Databases differ in format, access mechanism, and whether they are public.
  • Commonly used databases include:
  • Genbank, UniProt (biological sequence analysis)
  • Protein Data Bank (PDB) (structure analysis)
  • InterPro, Pfam (finding protein Families and Motif Finding)
  • Sequence Read Archive (Next Generation Sequencing)
  • KEGG, BioCyc (Network Analysis: Metabolic Pathway Databases)
  • GenoCAD (design of synthetic genetic circuits)

Software and Tools

  • Software tools for bioinformatics range from simple command-line tools to complex graphical programs and standalone web-services.
  • Free and open-source software tools for bioinformatics have existed as potential for innovative in silico experiments.
  • Open code bases facilitate both bioinformatics and range of open-source software.
  • Open source tools act as incubators of ideas.
  • Community-supported plug-ins in commercial applications act as de facto standards and shared object models for assisting with the challenge of bioinformation integration.
  • Open-source software titles include: Bioconductor, BioPerl, Biopython, BioJava, BioJS, BioRuby, Bioclipse, EMBOSS, .NET Bio, Orange with its bioinformatics add-on, Apache Taverna, UGENE and GenoCAD.
  • The non-profit Open Bioinformatics Foundation have supported the annual Bioinformatics Open Source Conference (BOSC) since 2000.
  • The MediaWiki engine with the WikiOpener extension can be used to build public bioinformatics databases.

Web-Services in Bioinformatics

  • SOAP and REST-based interfaces have been developed for bioinformatics applications, allowing applications to use algorithms, data and computing resources remotely.
  • The main advantages are the end user does not deal with software and database maintenance.
  • Basic bioinformatics services are classified by the EBI into these categories SSS (Sequence Search Services), MSA (Multiple Sequence Alignment) and BSA (Biological Sequence Analysis).

Bioinformatics Workflow Management Systems

  • A bioinformatics workflow management system is a specialized workflow management system designed for composing and executing computational or data manipulation steps in a Bioinformatics application.
  • These systems provide an easy-to-use environment for application scientists to create their own workflows.
  • These systems provide interactive tools enabling the scientists to execute their workflows and view the results in real time.
  • These systems simplify the process of sharing and reusing workflows between scientists.
  • These systems enable scientists to track the provenance of the workflow execution results and the workflow creation steps.
  • Platforms giving this service: Galaxy, Kepler, Taverna, UGENE, Anduril, and HIVE.

BioCompute and BioCompute Objects

  • The US Food and Drug Administration sponsored a conference held at the National Institutes of Health Bethesda Campus in 2014 for discussion reproducibility in bioinformatics.
  • A consortium of stakeholders met regularly over three years to discuss what defines the BioCompute paradigm.
  • These stakeholders included representatives from government, industry, and academic entities
  • This also included session leaders representing numerous branches of the FDA and NIH Institutes and Centers, non-profit entities.
  • These stakeholders included the Human Variome Project, the European Federation for Medical Informatics, and research institutions
  • These institutions included Stanford, the New York Genome Center, and the George Washington University.
  • The BioCompute paradigm would be in the form of 'lab notebooks' for the reproducibility, replication, review, and reuse, of bioinformatics protocols.
  • US FDA funded this work so pipelines would be transparent to their regulatory staff.
  • In 2016, the group reconvened at the NIH in Bethesda and discussed the potential for a BioCompute object as part of the BioCompute paradigm.
  • A "standard trial use" document with a preprint paper was uploaded to bioRxiv, while The BioCompute object shares the JSON-ized record among employees, collaborators, and regulators.

Biological Databases

  • Modern genomic research is hallmarked by the generation of enormous amounts of raw sequence data.
  • Sophisticated computational methodologies manage the growing volume of genomic data.
  • Computer databases manage the staggering amount of information and computer associated software is used to update, query, and retrieve components from the system.
  • A simple database might be a single file with records with same information
  • Databases organize data in a structured set of records for easy retrieval of information.
  • Examples include GenBank from NCBI, SwissProt from the Swiss Institute of Bioinformatics, and PIR from the Protein Information Resource.

Types of Biological Databases

  • Biological databases are divided into two categories: primary and secondary.

Primary Databases

  • They are also called archival databases.
  • They contain experimentally derived data such as nucleotide sequence, protein sequence or macromolecular structure.
  • Researchers submit experimental results directly into the database, so the data are essentially archival.
  • Data in primary databases aren't changed after being given an accession number, forming a part of the scientific record.
  • Examples include: ENA, GenBank and DDBJ (nucleotide sequence), Array Express Archive and GEO (functional genomics data), and Protein Data Bank (PDB; coordinates of three-dimensional macromolecular structures).

Secondary Databases

  • Secondary databases derive data from analyzing primary data.
  • Secondary databases often draw upon information from primary and secondary sources and controlled vocabularies with scientific literature.
  • These databases are highly curated, using a complex combination of computational algorithms and manual analysis and interpretation for deriving new knowledge from the public record of science.
  • Examples: InterPro (protein families, motifs and domains), UniProt Knowledgebase (sequence and functional information on proteins), and Ensembl (variation, function, regulation and more layered onto whole genome sequences).
  • However, many data resources resources have both primary and secondary characteristics
  • UniProt accepts primary sequences derived from peptide sequencing experiments.
  • UniProt also infers peptide sequences from genomic information and provides a wealth of additional information with automation (TrEMBL) and manual analysis (SwissProt).
  • Specialized databases cater to a research interest like Flybase, HIV sequence database, and Ribosomal Database Project which specialize in a particular organism or data type.

Importance of Databases

  • Databases act as a storehouse of information.
  • Databases are used to store and organize data in a way that information can be easily retrieved
  • Databases facilitate knowledge discovery, which identifies connections between pieces of information that were not known or entered before and discovering new insights from raw data.
  • Secondary databases provide molecular biologists with a reference library with information investigated by the research community.
  • Databases improve data access, indexing, and removing redundancy.

Nucleotide Sequences Databases

  • As biology has increasingly turned into a data-rich science, the need for storing and communicating large datasets has grown significantly.
  • The nucleotide and protein sequences are stored, as is 3D structural data produced by X-ray crystallography and macromolecular NMR.
  • Nucleic acid's biological information is in sequences and protein data is in sequences and structures
  • Nucleotide sequences are in single dimensioned, while the structure contains the three-dimensional data of sequences.
  • A biological database collects and organizes data that is easily accessed, managed, and updated,
  • This is combined with software for processing, archiving, querying and distributing data.
  • Databases with nucleotide sequences are called "nucleic acid sequence databases."

Nucleic acid Sequence Databases

  • The Nucleotide database is a sequence collection from sources like GenBank, RefSeq, TPA and PDB.
  • Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
  • Primary databases of nucleotide sequences are chief databases of with available raw nucleic acid sequences: GenBank, EMBL, DDBJ.
  • It serves as a repository and are also called the primary nucleotide sequence databases
  • GenBank is physically located in the USA, while EMBL, the European Molecular Biology Laboratory, is in the UK and DDBJ, the DNA databank of Japan, is in Japan.
  • To optimize synchronization the three databases accept nucleotide sequence submissions and exchange new and updated data consistently.
  • They are primary databases, as they house original sequence data and collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.
  • The GenBank sequence database is open access with annotation to available protein translations.
  • NCBI produces and maintains it as part of the International Nucleotide Sequence Database Collaboration (INSDC)
  • GenBank receives sequences world wide from more than 100,000 distinct organisms and has grown at an exponential rate, doubling roughly every 18 months.
  • The EMBL (European Molecular Biology Laboratory) Nucleotide Sequence Database is a comprehensive collection maintained at the European Bioinformatics Institute (EBI)

Secondary Databases of Nucleotide Sequences

  • Many secondary databases are sub-collections of GenBank or EMBL while other databases add value through annotations, software, presentation of info, and cross-references,
  • These databases do not present sequences, but they gather and show information from other sequence databases
  • The Omniome Database is a microbial resource maintained by TIGR to show the sequence and annotation of processed genomes
  • Omniome has information about organisms such as taxin and gram stain patterns, the structure and composition of their DNA molecules, and the attributes of protein sequences predicted from the DNA.
  • It facilitates meaningful multi-genome searches and analysis, like genome alignment, and comparing traits of proteins and genes from genomes.
  • In FlyBase Database, A consortium sequenced the entire genome of the fruit fly D. Melanogaster to ensure high degree of completeness and quality.
  • ACeDB serves as a repository with the sequence and genetic map, as well as phenotypic information about the C. Elegans nematode worm.

Protein Databases

  • Biology has increasingly turned into a data-rich domain, leading to an increase in data storage and communication of datasets.
  • Nucleotide and protein sequences, with structural data are produced through X-ray crystallography and macromolecular NMR.
  • Protein's biological information is in sequences and structures, while sequences are one-dimensional and structures are three-dimensional.
  • A biological database is a collection of data that can be easily accessed, managed, and updated.
  • A protein database includes datasets with protein amino acid sequence, conformation, structure, and features with active sites.
  • Protein databases consist of DNA sequences translated from different gene databases with structural information.
  • They are an important resource because proteins mediate most biological functions.

Importance of Protein Databases

  • Huge amounts of protein structure, function, and sequence data has lead to the creation and use of databases.
  • It has the following uses:
    • Comparison between proteins or protein families provide info on the relationship between proteins within a genome.
    • Secondary databases provide annotated, derived databases on these proteins.
    • These databases help researchers understand the structure and function of a protein.

Primary databases of Protein

  • The databases hold the experimentally determined protein sequences of translated nucleotide sequences.
  • The data is not experimentally derived but results from interpretation of nucleotide data.
  • A number of primary protein sequence databases exist.

Protein Information Resource (PIR) – Protein Sequence Database (PIR-PSD)

  • The PIR-PSD is collaborative between the PIR, Germany's MIPS, and Japan's JIPID.
  • The PIR-PSD is a comprehensive, non-redundant, expertly annotated, object-relational DBMS.
  • A key characteristic of the PIR-PSD is protein sequence classification using the superfamily concept.
  • The PIR-PSD sequence classifies homology domain and motifs: homology domains correspond to evolutionary building blocks, and sequence motifs represent functional sites or conserved regions.
  • The classification approach shows a complete understanding of sequence function-structure relationship.
  • The database also contributes to SWISS-PROT , in addition to being a well-known and used protein database which also provides high levels of annotation.
  • Each entry in the database is considered separately as core data and annotation.
  • Core data has the sequences entered as letter amino acid, related references, and bibliography.
  • The organism's taxonomy also forms part of core information.
  • Post-translational modification is included in the annotation, as is phosphorylation, acetylation, sites binding calcium, ATP, or zinc, and structural features like alpha helix, beta sheet, quaternary structure, and similar proteins.

Protein Databank (PDB)

  • PDB is a primary protein structure (crystallographic) database that is a for three-dimensional structure of large biological molecules.
  • In spite of the name, PDB also archives the three-dimensional structures of biologically important molecules.
  • The database data is usually determined via X-ray crystallography, NMR experiments, and molecular modeling. TrEMBL is a computer-annotated protein sequence database released as a SWISS-PROT supplement that contains the translation of all coding sequences present in EMBL nucleotide database.
  • This may have protein sequences that are never expressed or identified in organisms.

Functional Genomics

  • Functional genomics is the identification of genes and their respective functions.

Structural Genomics

  • Structural genomics is predictions related to functions of proteins.

Comparative Genomics

  • Comparative genomics is the means For understanding the genomes of different species of organisms.

DNA Microarrays

  • DNA microarrays measure the levels of gene expression in different tissues, various stages of development and in different diseases.

Annotation

  • These are text fields of information about a biosequence added to sequence databases.
  • Annotation defines the aspects, which are: function(s) of the protein, post-translational modification(s), domains and sites such as calcium binding regions, ATP-binding sites, zinc fingers, homeobox, kringle, secondary structure, quaternary structure, similarities to other proteins, disease(s) associated with deficiencie(s) in the protein, and sequence conflicts, variants, etc.

Transcriptome

  • The transcriptome is the set of mRNA transcripts produced by the genome at any one time.
  • All cells of an organism contain the same genome while the dynamic transcriptome varies considerably.

Significance of Transcriptomics

  • As the transcriptome includes all mRNA transcripts in the cell, it reflects the genes that are being expressed while the study of transcriptomics examines the expression level of mRNA in a given cell population.

Metabolomics

  • Genomics is concerned with the total complement of genes and proteomics with the analysis of the entire set of proteins.
  • Metabolomics measures low molecular weight metabolites both qualitatively and quantitatively in any given sample, cell or tissue, integrating data to analyze of gene function.
  • The genome expression profiling methods (transcriptome, proteome and the metabolome) developed at the level of the 'post-genomic era',
  • Comprehensive measurements for the differences between cell types, tissues, organs and whole organisms will allow a full and global comparison with measurements from working parts of the system to prove unknown characteristics of gene function, physiology and metabolism.

Areas of Metabolomics

  • Metabolic analysis is divided into 4 areas:
    • Target compound analysis, as the quantification of specific metabolites.
    • Metabolic profiling, as quantitative and qualitative determination of group compounds or of different members.
    • Metabolomics as quantitative and qualitative analysis of all metabolites.
    • Metabolic fingerprinting, as analysis for sample classification by rapid global analysis without extensive compound identification.

Pharmacogenomics

  • Pharmacogenomics studies how inherited genes interact with medicine and how they affect medications for each person
  • Genetic differences mean that a drug can be safe, harmful, have side effects or experience problems with different doses, so pharmacogenomics testing helps the doctor choose the safest and most effective drug and dose.
  • Pharmacogenomics is constantly changing; a result of researcher discoveries to identify genetic variations that affect how a drug works

How Pharmacogenomics Differs from Genetic Testing

  • Genetic testing searches for genes like BRCA1 and BRCA2 for preventive or risk reduction steps like more frequent cancer screening, lifestyle changes, and preventive treatment.

Benefits and Challenges of Pharmacogenomics

  • It may improve patient safety, prevent 120,000 severe drug reaction hospitalizations each year, improve health care costs and efficiency by helping give proper medications
  • It has challenges like its expense if insurance does not cover the costs so access to certain tests is limited and privacy issues remain, even with federal anti-discrimination laws.

Phylogenetic Analysis Steps

  • Building a phylogenetic tree requiring the identification and acquisition of a set of homologous DNA or protein sequences, the alignment of them, evaluating a tree from the sequences, and presenting the tree in a way that effectively shows relevant information.

Phylogenetic Trees

  • A phylogenetic tree is a diagram that represents evolutionary relationships among organisms, but is only a hypothesis.
  • Branching patterns in the tree reflects lineage evolved from a series of common ancestors.
  • In phylogenetic tress, two species are more related if they have more recent common ancestor and less related if they have less recent common ancestor.
  • Phylogenetic trees can be drawn in various equivalent styles without affecting their information

Systems of Classification

  • Classification based primarily on organisms' phylogeny.
  • All modern systems base their classification on the evolutionary relationships among organisms.
  • Systems organize species or other groups in ways that reflect lineage from common ancestors
  • Species or groups of interest are at the tips of lines referred to as the tree's branches.

Sequence Alignment

  • Alignments compare related DNA or protein sequences to capture facts about evolutionary descent or structural function and that the alignment is from a common ancestral sequence.
  • DNA molecules contain nucleotides, while protein molecules contain amino acids.
  • The specific order of nucleotides or amino acids called DNA and protein sequences
  • DNA sequences encode protein sequences, because proteins are involved in most biological functions of living cells.

Pairwise Alignment Definitions

  • An alphabet is a finite set of letters
  • A sequence is a finite string of letters chosen from the alphabet.
  • A null character is represented by the symbol "-" signifies an absent letter.
  • An expanded sequence S' is the sequence S with null characters placed at its start, end, or between any two of its characters.
  • A global pairwise alignment of sequences S and T is a one-to-one co-linear correspondence of expanded sequences S' and T'.
  • A local pairwise alignment of sequences S and T is a one-to-one co-linear correspondence of segments of expanded sequences with no nulls.

Pairwise Alignment Scores

  • Selecting among alignments of two sequences use a scoring function to assess each alignment.
  • Alignments with optimal scores are sought through establishing scores to align particular letters to one another or to nulls for calculating a sum.
  • A column of a pairwise alignment is the correspondence of a single letter (or null) with a single letter (or null).
  • A substitution is a column is aligning two letters.
  • A substitution score is a score defined for the substitution involving a particular pair of values.
  • An indel is a column aligning a letter with a null
  • An indel score is the score defined for aligning a letter with a null, where A gap of length k is composed of k adjacent indels.
  • The alignment score is the sum of substitution and indel scores for an alignment's columns

FASTA and BLAST

  • The quantities of large databases make alignment to find significant local alignment important

FASTA and BLAST Software

  • There are two similar and homologous DNA searches by excess sequence similarity: BLAST and FASTA
  • They provide facilities for comparing DNA and proteins sequences database through its functions.
  • BLAST and FASTA are fast because they both pairwise sequence alignment using words
  • They function finding short stretches of identical letters in two sequences- words
  • The basic assumption is that related sequences have at least one word in common.
  • A word match lets similarity regions extend from the words, join onto to a high-scoring full alignments.

Differences in Finding Sequence Alignments

  • BLAST is often used in finding ungapped, locally optimal, alignments
  • FASTA is often is involved in finding similarities between less similar sequences.

BLAST

  • (Basic Local Alignment Search Tool) was developed by Stephen Altschul of NCBI in 1990 and is a popular sequence analysis resource.
  • BLAST uses heuristics to align a query sequence with all sequences in a database to find high-scoring ungapped segments among related sequences.
  • The existence of segments above a threshold indicates pairwise similarity.
  • It is used quickly identify regions of local similarity between two sequences
  • It calculates an expectation value, which estimates number of matches
  • Various forms of BLAST include:
  • BLAST-N (nucleotide sequence with nucleotide sequences)
  • BLAST-P (protein sequences with protein sequences
  • BLAST-X (nucleotide sequences against protein sequences)
  • tBLAST-N with proteins sequences against a translation nucleotide one
  • tBLAST-X translates sequences to look for frames of all the code.

FASTA Definition

  • FASTA is a sequence alignment tool used to search similarities between sequences of DNA and proteins, using a “hashing” strategy to find matches for identical residues with a length of k.
  • Strings of residues known as ktuples or ktups identify two groups of residues (a search) to target sequences for full sequence matches

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Genetics and Bioinformatics Course Feedback
33 questions
Bioinformatics and Genetics Quiz
182 questions
Use Quizgecko on...
Browser
Browser